-
Notifications
You must be signed in to change notification settings - Fork 4
MXR Son of CGI
The proposed DSpace functionality known as CGI (Context Guided Ingest) has assumed a new identity (or at least acronym) in mds, but the concepts are essentially the same. At the highest level, there are a small set of APIs that enable mapping of DSpaceObjects (typically - but not necessarily - Items) to 'resources' that are useful to them in various situations, such as submission, workflow, etc. The notion of a resource here is not confined much, but examples from DSpace might include: 'input forms' used by web submission, 'submission steps' used in configurable submission, 'item templates' used for metadata creation, etc. The mapping facility is indirect, and includes the ability to express higher-level 'rules', for maximum flexibility. What follows is a summary of the APIs and how they connect.
The first API exposes a simple facility to assert persistent 'attributes' (i.e. name/value pairs) about any DSpaceObject. These attributes differ from generic metadata in several respects: there is no notion of qualification, language, etc - they are simple name/value pairs. They do not belong to the object's permanent curated state, and in this sense are outside the data model proper. Rather, they represent a way to persist useful state beyond in-memory representations. These attributes have one additional aspect: every attribute is relative to what is called a scope, which is essentially a named context of use. The primary function of scoping is to prevent attribute name collisions. The API is simple (all methods of DSpaceObject):
public void setAttribute(String scope, String name, String value)
public String getAttribute(String scope, String name)
public void clearAttributes(String scope)
public Set<String> getAttributeNames(String scope)
The clearAttributes call would typically be made when the object scope became irrelevant (e.g. when an object exits workflow), and all attributes are automatically purged when an object is deleted. The Object Attribute API has applicability beyond MXR - attributes could hold Workspace data (like the last page reached in a web submission), or workflow or curation results, etc.
At the other end, there is an API for mapping keys to named resources. A named resource in this context is a unique instance of a resource within a given class or type. To use an example from configurable submission (input-forms), "traditional" is the name of a given stanza in the input-forms.xml document that describes the resource (which is a structured list of metadata prompts and UI info). In DSpace, We map that resource to given collection IDs: collection:123456789/1 -> "traditional" in the input-forms.xml file. The mds API is also fairly simple (all the methods are attached to a ResourceMap object):
public void addResource(String key, String resId)
public void removeResource(String key)
public Set<String> resourceKeysFor(String resId)
public void addRule(String key, String rule)
public void removeRule(String key)
public Set<String> ruleKeysFor(String rule)
public void setBuilder(String builder)
public Object findResource(DSpaceObject dso, String scope)
The first three methods are relatively self-explanatory - adding, removing or querying the mappings to a specific resource with the name 'resId'. The next three do the same for rules (which are 'RCL' expressions). The next method assigns a class name to the object that will be invoked to instantiate (aka build) the requested resource. The last method requires a bit more explanation: it attempts to deliver a specific resource of a given type for an object in a given scope. How is this accomplished? Let us walk through the process. The find() method first attempts to locate a scope-specific rule for looking up resources of the type in question. These are found via lookup in the map, looking first for a rule mapped mapped to a given scope for the resource type, and failing that a 'default' rule for the type. A rule for input forms could look like:
collection:?,collection:default
This expression is the subject of the next section.
'rcle' above is an abbreviation for 'resource composition language expression'. RCL is a very simple grammar that allows both specification and composition of resources. If we examine the specimen expression used above:
collection:?,collection:default
we can observe the basic syntax of the language. RCL separates alternatives with commas (","), which are evaluated lazily left-to-right. In the above case, this means that only if the first alternative ("collection:?") fails to resolve to a resource ID, the next one ("collection:default") is used. The '?' indicates a look-up substitution from the query environment. This environment essentially consists of the passed object's scoped attributes. For example, for an item with the scope parameter "workspace", 'collection:?' would be resolve to the value returned by:
item.getAttribute("workspace", "collection")
which might yield the string: "collection:123456789/2". This string becomes the key to a resourceMap lookup, and assuming that the key is present in the map, a resourceId replaces the key. Suppose that resourceId is "traditional" (in the realm of input forms, eg). The final stage of the process is to obtain the actual resource from the id, which the method returns.
RCL can be used for slightly more complex cases of resource composition. For example, the expression:
type:?+collection:?,collection:default
means that if a type attribute is present, combine the type and collection input forms into a single new form, otherwise use the collection default form.