Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Field paths and phrase-through localnames not converted from XML to JSON in database creation calls #525

Closed
ahwitz opened this issue Dec 15, 2019 · 4 comments

Comments

@ahwitz
Copy link

ahwitz commented Dec 15, 2019

This is using ML 9.0-11, ml-gradle 3.16.4, and I think that should be all relevant version numbers? This likely fits under the deployer Java library, but I'm submitting it here just in case.

I'm migrating an old app that used a rather mangled version of the Roxy installer to a Gradle-based installation. I have XML versions of the database configurations that were compatible with the Roxy admin installer code, and I was getting errors when trying to use those to install; the same errors were reproducible by copying the database configuration out of the DB via curl --anyauth --user ____:_____ http://localhost:8002/manage/v2/databases/{database-name}/properties.

This seems to be an XML-to-JSON conversion issue, because adding -H "Accept: application-json" to that request and using the results imports the database corrrectly.

The following base case produces all the errors I ran into (with element-word-query-through/phrase-through/phrase-around/field) so far. I didn't keep digging for fear of accidentally removing fields (I have no idea why the phrase-throughs are here in the first place, legacy code problems in a nutshell), but can if desired.

Insert this file into a gradle setup at src/main/ml-config/databases/test.xml and run gradle -i mlDeploy:

<database-properties xmlns="http://marklogic.com/manage/database/properties">
  <database-name>test</database-name>
  <element-word-query-throughs>
    <element-word-query-through>
      <namespace-uri>http://schemas.microsoft.com/office/word/2003/wordml</namespace-uri>
      <localname>p</localname>
    </element-word-query-through>
  </element-word-query-throughs>
  <phrase-throughs>
    <phrase-through>
      <namespace-uri>http://schemas.microsoft.com/office/word/2003/wordml</namespace-uri>
      <localname>br cr fldChar fldData fldSimple hlink noBreakHyphen permEnd permStart pgNum proofErr r softHyphen sym t tab</localname>
    </phrase-through>
  </phrase-throughs>
  <phrase-arounds>
    <phrase-around>
      <namespace-uri>http://schemas.microsoft.com/office/word/2003/wordml</namespace-uri>
      <localname>delInstrText delText endnote footnote instrText pict rPr</localname>
    </phrase-around>
  </phrase-arounds>
  <fields>
    <field>
      <field-name>test-field</field-name>
      <field-path>
    <path>/test/path</path>
    <weight>1</weight>
      </field-path>
      <field-value-searches>true</field-value-searches>
      <word-lexicons/>
      <included-elements/>
      <excluded-elements/>
      <tokenizer-overrides/>
    </field>
  </fields>
</database-properties>

If any additional debug information is needed, let me know and I can pass it on. Thanks!

@rjrudin
Copy link
Contributor

rjrudin commented Dec 16, 2019

First, thanks for the test file, that always makes debugging much easier.

Release 3.15.0 and later defaults to using MarkLogic's CMA endpoint for faster creation of resources. This is documented at https://github.com/marklogic-community/ml-app-deployer/wiki/Configuration-Management-API .

What's not well-documented there (I know I documented this in the code, but I don't know if it made it into a Wiki page yet) is that when using CMA, the requests are always sent as JSON. So if your input files are XML, there's a process to convert the XML to JSON. That process can be buggy - it depends on unmarshalling the XML into Java objects, and then marshalling those Java objects back out to JSON. Those Java objects (in ml-app-deployer) always have to be up-to-date with the latest Manage API schemas, which is not guaranteed.

So I believe the issue here is that there's some gap in the Java objects in the latest version of ml-app-deployer.

There are two workarounds that you can use:

  1. Set mlDeployDatabasesWithCma=false or mlDeployWithCma=false in gradle.properties. This will turn off CMA usage either for databases or for everything. Your deployment might be slightly slower, but it should work fine.
  2. Convert your XML files to JSON. This will be tedious - but something to consider is that when using the Manage API with ml-gradle (as opposed to the Admin API with Roxy) is that with the Manage API, you only need to specify what's different from a default database configuration. So typically, your resource files can be much smaller - i.e. you don't need to define a bunch of phrase-throughs that you don't need.

@rjrudin
Copy link
Contributor

rjrudin commented Dec 16, 2019

I created this ticket for whatever the issue is with unmarshalling the XML - https://github.com/marklogic-community/ml-app-deployer/issues/391

I am going to close with the assumption that the above bug and the workarounds above will cover the issue. Please reopen if that assumption is incorrect.

@rjrudin rjrudin closed this as completed Dec 16, 2019
@ahwitz
Copy link
Author

ahwitz commented Dec 16, 2019

I was bracing for having to stare down the barrel of a "just convert to JSON and deal with it" response, so this is quite a pleasant surprise. I'd felt more secure with the XML version for some reason, but if JSON's the current state with the CMA endpoint, still sounds like converting to JSON is the best option. Especially given that the whole point of my current work is to modernize all this...

Subscribed to the other ticket and will update there if I encounter any other issues. Thanks for the quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants