Skip to content

Commit

Permalink
[TLC-674] Shift duplicate endpoint docs to duplicates.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kshepherd committed Feb 28, 2024
1 parent b319e64 commit 7faad23
Show file tree
Hide file tree
Showing 3 changed files with 98 additions and 92 deletions.
97 changes: 97 additions & 0 deletions duplicates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Duplicate detection endpoint
[Back to the list of all defined endpoints](endpoints.md)

## Main Endpoint
**/api/submission/duplicates**

Provide access to basic duplicate detection services. These services use Solr and the levenshtein distance operator
to detect potential duplicates of a given item, useful during submission and workflow review.

See `dspace/config/modules/duplicate-detection.cfg` for configuration properties and examples.

## Search

**GET /api/submission/duplicates/search?uuid=<:uuid>**

Provides a list of items that may be duplicates, if this feature is enabled, given the uuid as a parameter.

Note that although this appears in the submission category, the item UUID can also be an archived item.
Currently, the only frontend use of this feature is in workspace and workflow, so it is categorised as such.

Each potential duplicate has the following attributes:

* title: The item title
* uuid: The item UUID
* owningCollectionName: Name of the owning collection, if present
* workspaceItemId: Integer ID of the workspace item, if present
* workflowItemId: Integer ID of the workflow item, if present
* metadata: A list of metadata values copied from the item, as per configuration
* type: The value is always DUPLICATE. This is the 'type' category used for serialization/deserialization.

Example

```json
{
"potentialDuplicates": [
{
"title": "Example Item",
"uuid": "5ca83276-f003-460d-98b6-dd3c30708749",
"owningCollectionName": "Publishers",
"workspaceItemId": null,
"workflowItemId": null,
"metadata": {
"dc.title": [
{
"value": "Example Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dspace.entity.type": [
{
"value": "Publication",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "DUPLICATE"
}, {
"title": "Example Itom",
"uuid": "32f8f6e4-c79e-4322-aae7-07ee535f70a6",
"owningCollectionName": null,
"workspaceItemId": 51,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Example Itom",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}, {
"title": "Exaple Item",
"uuid": "0647ff45-48f5-4c1b-b6d7-f5dbbc160856",
"owningCollectionName": null,
"workspaceItemId": 52,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Exaple Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}]
}
```
1 change: 1 addition & 0 deletions endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
* [/api/integration/suggestions](suggestions.md)
* [/api/integration/suggestionsources](suggestionsources.md)
* [/api/integration/suggestiontargets](suggestiontargets.md)
* [/api/submission/duplicates](duplicates.md)

## Endpoints Under Development/Discussion
* [/api/authz/resourcepolicies](resourcepolicies.md)
Expand Down
92 changes: 0 additions & 92 deletions submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,95 +66,3 @@ The final Item's UUID will be the same as it was in the WorkspaceItem (i.e. the
`/api/submission/workspaceitems/<:id>/item`)
* If the Collection has an approval workflow configured, then a WorkflowItem will be returned. Its `id` can be used
to access the WorkflowItem via `/api/workflow/workflowitems/<:id>`.

## Finding potential duplicate items

**GET /api/submission/duplicates/search?uuid=<:uuid>**

Provides a list of items that may be duplicates, if this feature is enabled, given the uuid as a parameter.

The potential duplicates listed in the section have all been detected by a special Solr search that compares the
levenshtein edit distance between the in-progress item title and other item titles (normalised).

Note that although this appears in the submission category, the item UUID can also be an archived item.
Currently, the only frontend use of this feature is in workspace and workflow, so it is categorised as such.

Each potential duplicate has the following attributes:

* title: The item title
* uuid: The item UUID
* owningCollectionName: Name of the owning collection, if present
* workspaceItemId: Integer ID of the workspace item, if present
* workflowItemId: Integer ID of the workflow item, if present
* metadata: A list of metadata values copied from the item, as per configuration
* type: The value is always DUPLICATE. This is the 'type' category used for serialization/deserialization.

See `dspace/config/modules/duplicate-detection.cfg` for configuration properties and examples.

Example

```json
{
"potentialDuplicates": [
{
"title": "Example Item",
"uuid": "5ca83276-f003-460d-98b6-dd3c30708749",
"owningCollectionName": "Publishers",
"workspaceItemId": null,
"workflowItemId": null,
"metadata": {
"dc.title": [
{
"value": "Example Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
],
"dspace.entity.type": [
{
"value": "Publication",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}
]
},
"type": "DUPLICATE"
}, {
"title": "Example Itom",
"uuid": "32f8f6e4-c79e-4322-aae7-07ee535f70a6",
"owningCollectionName": null,
"workspaceItemId": 51,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Example Itom",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}, {
"title": "Exaple Item",
"uuid": "0647ff45-48f5-4c1b-b6d7-f5dbbc160856",
"owningCollectionName": null,
"workspaceItemId": 52,
"workflowItemId": null,
"metadata": {
"dc.title": [{
"value": "Exaple Item",
"language": null,
"authority": null,
"confidence": -1,
"place": 0
}]
},
"type": "DUPLICATE"
}]
}
```

0 comments on commit 7faad23

Please sign in to comment.