Skip to content

MarkLogic Content Pump (mlcp) and Gradle

rjrudin edited this page Aug 2, 2021 · 20 revisions

The MlcpTask class allows you to invoke MarkLogic's Content Pump tool (mlcp) via a Gradle task.

One benefit of using MlcpTask vs JavaExec is that MlcpTask will use the mlHost/(mlUsername or mlRestAdminUsername)/(mlPassword or mlRestAdminPassword) properties by default, which are defined in the mlAppConfig instance that ml-gradle instantiates in Gradle. Another benefit is you don't need to download mlcp and put the executable in your path - you can run this from anywhere, as all of mlcp's libraries are downloaded via Gradle. That's also handy for something like running mlcp on a Jenkins CI server.

MlcpTask also provides task properties for most of mlcp's command-line arguments. These are just syntactic sugar - since MlcpTask extends JavaExec, you can always pass properties through JavaExec's "args" property.

Note that you don't need to use MlcpTask either to use mlcp - just use JavaExec, and configure all of the command line arguments yourself.

Recommended Gradle version

As of ml-gradle 4.3.0, you should use at least Gradle 6.6. If you'd like to use Gradle 7.0 or higher, you must use ml-gradle 4.3.1 or higher.

Example

Below is an example of using MlcpTask and pulling in the mlcp dependencies (this omits the configuration needed for pulling in ml-gradle - see the mlcp-project build file for a more complete example, which shows both import and export tasks):

plugins {
  id "com.marklogic.ml-gradle" version "4.3.1"
}

repositories {
  mavenCentral()
  maven { url "https://developer.marklogic.com/maven2/" }
}

configurations {
  mlcp
}

dependencies {
  mlcp "com.marklogic:mlcp:10.0.6.2"
}

task sample(type: com.marklogic.gradle.task.MlcpTask) {
  classpath = configurations.mlcp
  command = "IMPORT"
  database = "my-database"
  input_file_path = "my-input-file.txt"
  input_file_type = "delimited_text"
  output_collections = "my-collection"
  // Can also override the default properties
  // username = "some-other-username"
  etc...
}

Avoid duplication across MLCP tasks

See Dynamically creating tasks for tips on reducing duplication across many MLCP tasks.

MLCP and logging

MLCP uses Log4j for logging. When you depend on MLCP via a dependency, you don't get a default log4j.properties file. And thus, you won't get any logging from MLCP.

This is easy to fix though - one way is to make an e.g. ./lib/log4j.properties file (the name of the directory can be anything), and then add "lib" as an mlcp dependency:

dependencies {
  mlcp "com.marklogic:mlcp:10.0.6.2"
  mlcp files("lib")
}

And here's a very simple log4j.properties file that you can use as a starting point:

# Root logger option
log4j.rootLogger=INFO, stdout

# Direct log messages to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

Using an MLCP transform

Be aware that MlcpTask defaults to using port 8000. IF you specify a transform parameter in your MlcpTask, then you will need to set the "port" parameter to that of your XDBC server or REST server that supports XDBC requests.

Writing log output to MarkLogic

New in ml-gradle 2.6.0 - you can set the logOutputUri parameter to define a URI for mlcp log output to be written to:

task sample(type: com.marklogic.gradle.task.MlcpTask) {
  ...
  logOutputUri = "/mlcp-output.txt"
}

And new in 3.12.0 - you can provide a custom DatabaseClient to control what database the log output is written to (it defaults to mlAppConfig.newDatabaseClient()):

task sample(type: com.marklogic.gradle.task.MlcpTask) {
  ...
  logOutputUri = "/mlcp-output.txt"
  logClient = mlAppConfig.newModulesDatabaseClient() // Just notional - reference or construct any DatabaseClient you want
}

Suppressing Hadoop binary messages on Windows

When running mlcp via Gradle on Windows, you're likely to see the following message logged:

java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

It reads as an exception, but unless you're using certain Hadoop-based features within mlcp, you can safely ignore this. If MLCP is instead throwing an error later on, you likely should use the MLCP standalone distribution instead of using MlcpTask.

You can also suppress the message by performing the following steps:

  1. Create a dummy lib\bin\winutils.exe file in your project
  2. Add the following to your task that extends MlcpTask:
systemProperties = ["hadoop.home.dir" : "$project.rootDir/lib"]
Clone this wiki locally