Skip to content

Latest commit

 

History

History
66 lines (47 loc) · 3.55 KB

README.md

File metadata and controls

66 lines (47 loc) · 3.55 KB

Judgments parser

This parser converts UK judgments from .docx format to XML. It is written in C# and requires .NET 5.0.

C# API

To invoke the parser programatically, clients should use the classes in the UK.Gov.NationalArchives.Judgments.Api namespace.

  1. Create a Request object, with the following properties:
    • Content (required), a byte array, the content of the judgment, in .docx format
    • Filename (optional), a string, the name of .docx file containing the judgment
    • Attachments (optional), an array of Attachment objects, having the following properties:
      • Content (required), a byte array, the content of the attachment, in .docx format
      • Type (required), an enum, with the following possibe values: Order
      • Filename (optional), a string, the name of .docx file containing the attachment
    • Meta (optional), a Meta object, with the following properties:
      • Court (optional), a string, the identifier of the court
      • Cite (optional), a string, the natural citation of the case
      • Date (optional), a date, the date of the judgment
      • Name (optional), a string, the case name
      • Uri (optional), a string, a URI for the judgment
      • Attachments (optional), an array of ExternalAttachment objects, having the following properties:
        • Name (required), a string, the name of the attachment for display
        • Link (optional), a string, a URL for the attachment
    • Hint (optional), an enum, with the following possibe values: UKSC, UKCA, UKHC, UKUT, Judgment, PressSummary. If present, the parser will attempt to parse a judgment only of the specified type.
  2. Pass it to the Parse method in the Parser class,
  3. Receive a Response object, which will have the following properties:
    • Xml, a string, the judgment in LegalDocML
    • Images, an array of Image objects, having the following properties:
      • Content, a byte array, the content of the image
      • Type, a string, the MIME type of the image
      • Name, a string, the name of the image as referred to in the XML
    • Meta, a Meta object, as above

REST API

A REST API, mimicking the above, is available at https://parse.judgments.tna.jurisdatum.com. Its specification can be found at /api.yaml.

CLI

The parser can also be invoked from the command line, as follows:

dotnet run --input path/to/file.docx

So, for example, the following command will parse the included test document and direct the output to the console:

dotnet run --input test/judgments/test1.docx

To direct the XML output to a file, use the --output option, like so:

dotnet run --input test/judgments/test1.docx --output something.xml

To save the XML and all of the embedded images to a .zip file, use the --output-zip option, like so:

dotnet run --input test/judgments/test1.docx --output-zip something.zip

If the --log option is used, the parser will log its progress to the specified file. For example:

dotnet run --input test/judgments/test1.docx --output something.xml --log log.txt

And if the --test option is used, the parser will perform a few tests and display the results either in the console or, if logging is enabled, to the log file.