From 4567e6fcd2c5da820ad68d13c44672282c349627 Mon Sep 17 00:00:00 2001 From: Peter Sefton Date: Thu, 30 Jan 2025 11:51:00 +1100 Subject: [PATCH] Further updates as requested by @elichad --- .../_specification/1.2-DRAFT/data-entities.md | 326 +++++++++--------- .../1.2-DRAFT/root-data-entity.md | 7 +- docs/_specification/1.2-DRAFT/structure.md | 6 +- 3 files changed, 171 insertions(+), 168 deletions(-) diff --git a/docs/_specification/1.2-DRAFT/data-entities.md b/docs/_specification/1.2-DRAFT/data-entities.md index c090d8be..85e20675 100644 --- a/docs/_specification/1.2-DRAFT/data-entities.md +++ b/docs/_specification/1.2-DRAFT/data-entities.md @@ -45,8 +45,8 @@ The data entities can be further described by referencing [contextual entities]( Where files and folders are represented as _Data Entities_ in the RO-Crate JSON-LD, these MUST be linked to, either directly or indirectly, from the [Root Data Entity](root-data-entity) using the [hasPart] property. Directory hierarchies MAY be represented with nested [Dataset] _Data Entities_, or the Root Data Entity MAY refer to files anywhere in the hierarchy using [hasPart]. _Data Entities_ representing files: MUST have `"File"` as a value for `@type`. `File` is an RO-Crate alias for . The term _File_ includes: -- _Attached_ resources which are available locally and -- _Detached_ "downloadable" resources which can be can be downloaded and saved as a file. +- Resources which are available locally (applicable only in the context of _Attached RO-Crate Packages_) and +- [Web-based Data Entities](#web-based-data-entity) which can be downloaded and saved as a file. The rules for the `@id` property of Files are set out below. @@ -95,8 +95,6 @@ Further constraints on the `@id` are dependent on whether the [File] entity is b - - Additionally, `File` entities SHOULD have: * [name] giving a human readable name (not necessarily the filename) @@ -115,6 +113,169 @@ RO-Crate's `File` is an alias for schema.org type [MediaObject], any of its prop +### Directory Data Entity + +A [Dataset] (directory) _Data Entity_ MUST have the following properties: + +* `@type` MUST be `Dataset` or an array where `Dataset` is one of the values. +* `@id` MUST be either: + * a _URI Path_ that SHOULD end with `/`. + * an absolute URI + * a local reference beginning with `#` + +For an _Attached RO-Crate Package_: +* If the @id is a relative path, then it MUST that resolve to a directory which must be present in the RO-Crate Root along with its parent directories. + +For a _Detached RO-Crate Package_: +* If the `@id` is a _URI Path it MAY be used to create a directory and MAY resolve to a service which returns a list of files +* If the `@id` is a URL then it SHOULD resolve to a service which returns a list of files + +Additionally, `Dataset` entities SHOULD have: + +* [name] giving a human readable name (not necessarily the directory name) +* [description] giving a longer description, e.g. the content of this directory +* [hasPart] listing directly contained data entities + +Any of the properties of schema.org [Dataset] MAY additionally be used (adding contextual entities as needed). [Directories on the web](#directories-on-the-web-dataset-distributions) SHOULD also provide `distribution`. + + + +## Web-based Data Entities + + +Using Web-based data entities can be important particularly where a file can't be included in the _RO-Crate Root_ because of licensing concerns, large data sizes, privacy, or where it is desirable to link to the latest online version. + +Example of an RO-Crate including a _File Data Entity_ external to the _RO-Crate Root_ (file entity ): + +```json +{ "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context", + "@graph": [ + { + "@type": "CreativeWork", + "@id": "ro-crate-metadata.json", + "conformsTo": {"@id": "https://w3id.org/ro/crate/1.2-DRAFT"}, + "about": {"@id": "./"} + }, + { + "@id": "./", + "@type": [ + "Dataset" + ], + "hasPart": [ + { + "@id": "survey-responses-2019.csv" + }, + { + "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf" + } + ] + }, + { + "@id": "survey-responses-2019.csv", + "@type": "File", + "name": "Survey responses", + "contentSize": "26452", + "encodingFormat": "text/csv" + }, + { + "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf", + "@type": "File", + "name": "RO-Crate specification", + "contentSize": "310691", + "description": "RO-Crate specification", + "encodingFormat": "application/pdf" + } +] +} +``` + +Additional care SHOULD be taken to improve persistence and long-term preservation of web resources included +in an RO-Crate, as they can be more difficult to archive or move along with the _RO-Crate Root_, and +may change intentionally or unintentionally, leaving the RO-Crate with incomplete or outdated information. + +File Data Entries with an `@id` URI outside the _RO-Crate Root_ SHOULD at the time of RO-Crate creation be directly downloadable by a simple non-interactive retrieval (e.g. HTTP GET) of a single data stream, permitting redirections and HTTP/HTTPS authentication. For instance, in the example above, and cannot be used as `@id` above as retrieving these URLs give a HTML landing page rather than the desired PDF as indicated by `encodingFormat`. + +{ include callout.html type="note" content="_Web-based Data Entities_ SHOULD NOT reference intermediate resources such as splash-pages, search services or web-based viewer applications." } + + +As files on the web may change, the timestamp property [sdDatePublished] SHOULD be included to indicate when the absolute URL was accessed, and derived metadata like [encodingFormat] and [contentSize] were considered to be representative: + +```json + { + "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf", + "@type": "File", + "name": "RO-Crate specification", + "contentSize": "310691", + "encodingFormat": "application/pdf", + "sdDatePublished": "2020-04-09T13:09:21+01:00Z" + } +``` + +Web-based entities MAY use the property [localPath] to indicate a path that can be used to when downloading the data in an _Attached RO-Crate Package_ context. This may be used to instantiate local copies of web-based resources in an _Attached RO-Crate Package_ or as part of a process to download a local resources from a _Detached RO-Crate Package_ relative to a new root directory. + +```json + { + "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf", + "localPath": "docs/ro-crate-1.0.0.pdf", + "@type": "File", + "name": "RO-Crate specification", + "contentSize": "310691", + "encodingFormat": "application/pdf", + "sdDatePublished": "2020-04-09T13:09:21+01:00Z" + } +``` + + + +{% include callout.html type="note" content="Do not use web-based URI identifiers for files which _are_ present in the crate root, see [below](#embedded-data-entities-that-are-also-on-the-web)." %} + + +### Encoding file paths + +Note that all `@id` [identifiers must be valid URI references](appendix/jsonld#describing-entities-in-json-ld), care must be taken to express any relative paths using `/` separator, correct casing, and escape special characters like space (`%20`) and percent (`%25`), for instance a _File Data Entity_ from the Windows path `Results and Diagrams\almost-50%.png` becomes `"@id": "Results%20and%20Diagrams/almost-50%25.png"` in the _RO-Crate JSON-LD_. + +In this document the term _URI_ includes international *IRI*s; the _RO-Crate Metadata File_ is always UTF-8 and international characters in identifiers SHOULD be written using native UTF-8 characters (*IRI*s), however traditional URL encoding of Unicode characters with `%` MAY appear in `@id` strings. Example: `"@id": "面试.mp4"` is preferred over the equivalent `"@id": "%E9%9D%A2%E8%AF%95.mp4"` + + +### Embedded data entities that are also on the web + +File Data Entities that are present as local files may already have a corresponding web presence, for instance a landing page that describes the file, including persistent identifiers (e.g. DOI) resolving to an intermediate HTML page instead of the downloadable file directly. + +These MAY be included for File Data Entities as additional metadata, regardless of whether the File is included in the _RO-Crate Root_ directory or exists on the Web, by using the properties: + +* [identifier] for formal identifier strings such as DOIs +* [contentUrl] with a string URL corresponding to a *download* link. Following the link (allowing for HTTP redirects) SHOULD directly download the file. +* [url] with a string URL for a download/landing page for this particular file (e.g. direct download is not available) +* [subjectOf] to a [CreativeWork] (or [WebPage]) that mentions this file or its content (but also other resources) +* [mainEntityOfPage] to a [CreativeWork] (or [WebPage]) that primarily describes this file (or its content) + + +Note that if a local file is intended to be packaged within an _Attached RO-Crate Package_, the `@id` property MUST be a _URI Path_ relative to the _RO Crate root_, for example `survey-responses-2019.csv` as in the example below, where the content URL points to a download endpoint as a string. + +```json + { + "@id": "survey-responses-2019.csv", + "@type": "File", + "name": "Survey responses", + "encodingFormat": "text/csv", + "contentUrl": "http://example.com/downloads/2019/survey-responses-2019.csv", + "subjectOf": {"@id": "http://example.com/reports/2019/annual-survey.html"} + }, + { + "@id": "http://example.com/reports/2019/annual-survey.html", + "@type": "WebPage", + "name": "Survey responses (landing page)" + } +``` + + +### Directories on the web; dataset distributions + +A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on the web can be harder to download than a [File] because it consists of multiple resources. It is RECOMMENDED that such directories have a complete listing of their content in [hasPart], enabling download traversal, or are themselves RO-Crates. + + + + ### _Attached RO-Crate Package_ Example linking to a file and folders @@ -288,163 +449,6 @@ The [Metadata Descriptor](root-data-entity#ro-crate-metadata-descriptor) `ro-cra -### Directory Data Entity - -A [Dataset] (directory) _Data Entity_ MUST have the following properties: - -* `@type` MUST be `Dataset` or an array where `Dataset` is one of the values. -* `@id` MUST be either: - * a _URI Path_ that SHOULD end with `/`. - * an absolute URI - * a local reference beginning with `#` - -For an _Attached RO-Crate Package_: -* If the @id is a relative path, then it MUST that resolve to a directory which must be present in the RO-Crate Root along with its parent directories. - -For a _Detached RO-Crate Package_: -* If the `@id` is a _URI Path it MAY be used to create a directory and MAY resolve to a service which returns a list of files -* If the `@id` is a URL then it SHOULD resolve to a service which returns a list of files - -Additionally, `Dataset` entities SHOULD have: - -* [name] giving a human readable name (not necessarily the directory name) -* [description] giving a longer description, e.g. the content of this directory -* [hasPart] listing directly contained data entities - -Any of the properties of schema.org [Dataset] MAY additionally be used (adding contextual entities as needed). [Directories on the web](#directories-on-the-web-dataset-distributions) SHOULD also provide `distribution`. - - - -## Web-based Data Entities - - - -Using Web-based data entities can be important particularly where a file can't be included in the _RO-Crate Root_ because of licensing concerns, large data sizes, privacy, or where it is desirable to link to the latest online version. - -Example of an RO-Crate including a _File Data Entity_ external to the _RO-Crate Root_ (file entity ): - -```json -{ "@context": "https://w3id.org/ro/crate/1.2-DRAFT/context", - "@graph": [ - { - "@type": "CreativeWork", - "@id": "ro-crate-metadata.json", - "conformsTo": {"@id": "https://w3id.org/ro/crate/1.2-DRAFT"}, - "about": {"@id": "./"} - }, - { - "@id": "./", - "@type": [ - "Dataset" - ], - "hasPart": [ - { - "@id": "survey-responses-2019.csv" - }, - { - "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf" - } - ] - }, - { - "@id": "survey-responses-2019.csv", - "@type": "File", - "name": "Survey responses", - "contentSize": "26452", - "encodingFormat": "text/csv" - }, - { - "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf", - "@type": "File", - "name": "RO-Crate specification", - "contentSize": "310691", - "description": "RO-Crate specification", - "encodingFormat": "application/pdf" - } -] -} -``` - -Additional care SHOULD be taken to improve persistence and long-term preservation of web resources included -in an RO-Crate, as they can be more difficult to archive or move along with the _RO-Crate Root_, and -may change intentionally or unintentionally, leaving the RO-Crate with incomplete or outdated information. - -File Data Entries with an `@id` URI outside the _RO-Crate Root_ SHOULD at the time of RO-Crate creation be directly downloadable by a simple non-interactive retrieval (e.g. HTTP GET) of a single data stream, permitting redirections and HTTP/HTTPS authentication. For instance, in the example above, and cannot be used as `@id` above as retrieving these URLs give a HTML landing page rather than the desired PDF as indicated by `encodingFormat`. - -As files on the web may change, the timestamp property [sdDatePublished] SHOULD be included to indicate when the absolute URL was accessed, and derived metadata like [encodingFormat] and [contentSize] were considered to be representative: - -```json - { - "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf", - "@type": "File", - "name": "RO-Crate specification", - "contentSize": "310691", - "encodingFormat": "application/pdf", - "sdDatePublished": "2020-04-09T13:09:21+01:00Z" - } -``` - -Web based entities MAY use the property [localPath] to indicate a path that can be used to when downloading the data in an _Attached RO-Crate Package_ context. This may be used to instantiate local copies of web-based resources in an _Attached RO-Crate Package_ or as part of a process to download a local resources from a _Detached RO-Crate Package_ relative to a new root directory. - -```json - { - "@id": "https://zenodo.org/record/3541888/files/ro-crate-1.0.0.pdf", - "localPath": "docs/ro-crate-1.0.0.pdf", - "@type": "File", - "name": "RO-Crate specification", - "contentSize": "310691", - "encodingFormat": "application/pdf", - "sdDatePublished": "2020-04-09T13:09:21+01:00Z" - } -``` - - - -{% include callout.html type="note" content="Do not use web based URI identifiers for files which _are_ present in the crate root, see [below](#embedded-data-entities-that-are-also-on-the-web)." %} - - -### Encoding file paths - -Note that all `@id` [identifiers must be valid URI references](appendix/jsonld#describing-entities-in-json-ld), care must be taken to express any relative paths using `/` separator, correct casing, and escape special characters like space (`%20`) and percent (`%25`), for instance a _File Data Entity_ from the Windows path `Results and Diagrams\almost-50%.png` becomes `"@id": "Results%20and%20Diagrams/almost-50%25.png"` in the _RO-Crate JSON-LD_. - -In this document the term _URI_ includes international *IRI*s; the _RO-Crate Metadata File_ is always UTF-8 and international characters in identifiers SHOULD be written using native UTF-8 characters (*IRI*s), however traditional URL encoding of Unicode characters with `%` MAY appear in `@id` strings. Example: `"@id": "面试.mp4"` is preferred over the equivalent `"@id": "%E9%9D%A2%E8%AF%95.mp4"` - - -### Embedded data entities that are also on the web - -File Data Entities that are present as local files may already have a corresponding web presence, for instance a landing page that describes the file, including persistent identifiers (e.g. DOI) resolving to an intermediate HTML page instead of the downloadable file directly. - -These MAY be included for File Data Entities as additional metadata, regardless of whether the File is included in the _RO-Crate Root_ directory or exists on the Web, by using the properties: - -* [identifier] for formal identifier strings such as DOIs -* [contentUrl] with a string URL corresponding to a *download* link. Following the link (allowing for HTTP redirects) SHOULD directly download the file. -* [url] with a string URL for a download/landing page for this particular file (e.g. direct download is not available) -* [subjectOf] to a [CreativeWork] (or [WebPage]) that mentions this file or its content (but also other resources) -* [mainEntityOfPage] to a [CreativeWork] (or [WebPage]) that primarily describes this file (or its content) - - -Note that if a local file is intended to be packaged within an _Attached RO-Crate Package_, the `@id` property MUST be a _URI Path_ relative to the _RO Crate root_, for example `survey-responses-2019.csv` as in the example below, where the content URL points to a download endpoint as a string. - -```json - { - "@id": "survey-responses-2019.csv", - "@type": "File", - "name": "Survey responses", - "encodingFormat": "text/csv", - "contentUrl": "http://example.com/downloads/2019/survey-responses-2019.csv", - "subjectOf": {"@id": "http://example.com/reports/2019/annual-survey.html"} - }, - { - "@id": "http://example.com/reports/2019/annual-survey.html", - "@type": "WebPage", - "name": "Survey responses (landing page)" - } -``` - - -### Directories on the web; dataset distributions - -A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on the web can be harder to download than a [File] because it consists of multiple resources. It is RECOMMENDED that such directories have a complete listing of their content in [hasPart], enabling download traversal, or are themselves RO-Crates. #### Referencing other RO-Crates diff --git a/docs/_specification/1.2-DRAFT/root-data-entity.md b/docs/_specification/1.2-DRAFT/root-data-entity.md index 91f5f482..ce696aaf 100644 --- a/docs/_specification/1.2-DRAFT/root-data-entity.md +++ b/docs/_specification/1.2-DRAFT/root-data-entity.md @@ -153,10 +153,9 @@ Additional properties of _schema.org_ types [Dataset] and [CreativeWork] MAY be The root data entity's `@id` SHOULD be either `./` (indicating the directory of `ro-crate-metadata.json` is the [RO-Crate Root](structure)), or an absolute URI. -{: note} -> RO-Crates that have been assigned a _persistent identifier_ (e.g. a DOI) MAY indicate this using [identifier] on the root data entity using the approach set out in the [Science On Schema.org guides], that is through a `PropertyValue` or MAY use a full persistent URL as the `@id` for the _Root Data Entity_. -> -> RO-Crate 1.1 and earlier recommended `identifier` to be plain string URIs. Clients SHOULD be permissive of an RO-Crate `identifier` being a string (which MAY be a URI), or a `@id` reference, which SHOULD be represented as an `PropertyValue` entity which MUST have a human readable `value`, and SHOULD have a `url` if the identifier is Web-resolvable. A citable representation of this persistent identifier MAY be given as a `description` of the `PropertyValue`, but as there are more than 10.000 known [citation styles], no attempt should be made to parse this string. +{% include callout.html type="note" content="RO-Crates that have been assigned a _persistent identifier_ (e.g. a DOI) MAY indicate this using [identifier] on the root data entity using the approach set out in the [Science On Schema.org guides], that is through a `PropertyValue` or MAY use a full persistent URL as the `@id` for the _Root Data Entity_." %} + +{% include callout.html type="note" content="RO-Crate 1.1 and earlier recommended `identifier` to be plain string URIs. Clients SHOULD be permissive of an RO-Crate `identifier` being a string (which MAY be a URI), or a `@id` reference, which SHOULD be represented as an `PropertyValue` entity which MUST have a human readable `value`, and SHOULD have a `url` if the identifier is Web-resolvable. A citable representation of this persistent identifier MAY be given as a `description` of the `PropertyValue`, but as there are more than 10.000 known [citation styles], no attempt should be made to parse this string."%} #### Resolvable persistent identifiers and citation text diff --git a/docs/_specification/1.2-DRAFT/structure.md b/docs/_specification/1.2-DRAFT/structure.md index 76d2692d..6ee6ad3f 100644 --- a/docs/_specification/1.2-DRAFT/structure.md +++ b/docs/_specification/1.2-DRAFT/structure.md @@ -56,7 +56,7 @@ In all crates the metadata is completed with [contextual entities](contextual-en [JSON-LD](https://json-ld.org/) is a structured form of [JSON] that can represent a _Linked Data_ graph. -* The _RO-Crate Metadata Document_ MUST contain _RO-Crate JSON-LD_; a valid [JSON-LD 1.0] document in [flattened] and [compacted] form +* The _RO-Crate Metadata Document_ MUST be a document which is valid [JSON-LD 1.0] in [flattened] and [compacted] form. * The _RO-Crate JSON-LD_ MUST use the _RO-Crate JSON-LD Context_ by reference. @@ -209,9 +209,9 @@ A _Detached RO-Crate Package_ is an RO-Crate, defined in an _RO-Crate Metadata D Unlike an _Attached RO-Crate Package_ a _Detached RO-Crate Package_ is not processed in a file-system context and thus does not carry a data _payload_ in the same sense, but may reference data deposited separately, or purely reference [contextual entities](contextual-entities). -In a _Detached RO-Crate Package_ the [root data entity](root-data-entity) SHOULD have an @id which is an absolute URL. +In a _Detached RO-Crate Package_ the [root data entity](root-data-entity) SHOULD have an @id which is an absolute URL if it is available online. If it is not yet, or will never be available online then @id may be any valid URI - including `./`. -Any [data entities](data-entities) in a _Detached RO-Crate Package Package_ are assumed to be [Web-based Data Entities](data-entities.html#web-based-data-entities). +Any [data entities](data-entities) in a _Detached RO-Crate Package Package_ MUST be [Web-based Data Entities](data-entities.html#web-based-data-entities). A Detached RO-Crate Package may still use `#`-based local identifiers for [contextual entities](contextual-entities).