Skip to content

Commit 8149399

Browse files
yufei118liumjpost
andauthored
New workflow simplifying metadata corrections (#4147)
This PR implements a semi-automated bulk processing workflow for metadata corrections. We hope that it will vastly simplify the metadata correction experience for both users and Anthology staff. It includes the following pieces: * A link is added to each paper page which, when clicked, displays a modal dialog allowing for editing of the paper's title, author list, and abstract * Submitting the dialog takes the user to a Github issue template populated with the changed information in a JSON data block * A new script, process_bulk_metadata.py, processes these work items, makes the corrections, bundles them into a branch, and creates a unified PR * A new Github workflow replaces the old manual one * Documentation regarding workflow corrections is updated --------- Co-authored-by: Matt Post <mattpost@microsoft.com> Co-authored-by: Matt Post <post@cs.jhu.edu>
1 parent eeb11d5 commit 8149399

14 files changed

+733
-253
lines changed

.github/ISSUE_TEMPLATE/01-metadata-correction.yml

-73
This file was deleted.

.github/ISSUE_TEMPLATE/99-bulk-metadata-correction.yml

+5-8
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,16 @@ body:
77
- type: markdown
88
attributes:
99
value: >
10-
This form is activated by following a link from each paper page in the Anthology (e.g., https://preview.aclanthology.org/autopr/K17-1003/). The form will list the title, abstract, and authors in JSON format, which you can manipulate to make corrections, such as adjusting a title, correcting an author name, adding a missing author, or reordering names.
11-
12-
Please note to take care to preserve structure such as the `<fixed-case>` tag that is sometimes present in titles.
13-
10+
**This form is not meant to be used manually.** Instead, it is activated by clicking the yellow "Fix metadata" button found on each paper page in the Anthology (e.g., https://aclanthology.org/K17-1003/). Clicking this button displays a UI tool for modifying the title, abstract, and author list. Submission of that form will automatically populate the title above and data block below.
11+
- type: markdown
12+
attributes:
13+
value: >
1414
Corrections will be processed in bulk on a weekly basis after verification by Anthology staff.
1515
- type: textarea
1616
id: data
17-
attributes:
18-
label: JSON code block
19-
description: Please make corrections below, taking care to preserve the JSON structure (e.g., no trailing commas at the end of lists). If you add an author, do not worry about the ID unless you know what it is.
2017
validations:
2118
required: true
2219
- type: markdown
2320
attributes:
2421
value: |
25-
**Note:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.
22+
**Important:** If you request an author name correction for yourself, please help ensure your name is correct for future publications by setting it correctly in Softconf/OpenReview.

.github/workflows/annotate-metadata-issue.yml

+1-4
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ jobs:
1818
const hasRequiredLabels =
1919
labels.includes('correction') &&
2020
labels.includes('metadata');
21-
2221
core.setOutput('has_required_labels', hasRequiredLabels.toString());
2322
2423
- name: Parse JSON from issue body
@@ -56,9 +55,7 @@ jobs:
5655
script: |
5756
const anthology_id = core.getInput('anthology_id');
5857
const comment = `
59-
Found ACL Anthology entry:
60-
61-
📄 Paper: https://aclanthology.org/${anthology_id}
58+
Found ACL Anthology entry: https://aclanthology.org/${anthology_id}
6259
6360
![Thumbnail](https://aclanthology.org/thumb/${anthology_id}.jpg)
6461
`;
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
name: Reset Approval on Edit
2+
3+
on:
4+
issues:
5+
types:
6+
- edited
7+
8+
jobs:
9+
reset-approval:
10+
if: contains(join(github.event.issue.labels.*.name, ','), 'correction') || contains(join(github.event.issue.labels.*.name, ','), 'metadata')
11+
runs-on: ubuntu-latest
12+
steps:
13+
- name: Leave a comment and reset approval status
14+
uses: actions/github-script@v6
15+
with:
16+
script: |
17+
const { issue, repository } = context.payload;
18+
const owner = repository.owner.login;
19+
const repo = repository.name;
20+
const approvedLabel = "approved";
21+
22+
// Check if the issue has the "approved" label and remove it
23+
if (issue.labels.some(label => label.name === approvedLabel)) {
24+
await github.issues.removeLabel({
25+
owner,
26+
repo,
27+
issue_number: issue.number,
28+
name: approvedLabel
29+
});
30+
31+
// Add a comment to notify about the edit
32+
await github.issues.createComment({
33+
owner,
34+
repo,
35+
issue_number: issue.number,
36+
body: "Approval status has been reset after the issue was edited."
37+
});
38+
}

.github/workflows/validate-metadata-issue.yml

-144
This file was deleted.

bin/anthology/utils.py

+19-8
Original file line numberDiff line numberDiff line change
@@ -508,14 +508,25 @@ def parse_element(
508508
return attrib
509509

510510

511-
def make_simple_element(tag, text=None, attrib=None, parent=None, namespaces=None):
512-
"""Convenience function to create an LXML node"""
513-
el = (
514-
etree.Element(tag, nsmap=namespaces)
515-
if parent is None
516-
else etree.SubElement(parent, tag)
517-
)
518-
if text:
511+
def make_simple_element(
512+
tag, text=None, attrib=None, parent=None, sibling=None, namespaces=None
513+
):
514+
"""Convenience function to create an LXML node.
515+
516+
:param tag: the tag name
517+
:param text: the text content of the node
518+
:param attrib: a dictionary of attributes
519+
:param parent: the parent node
520+
:param sibling: if provided and found, the new node will be inserted after this node
521+
"""
522+
el = etree.Element(tag, nsmap=namespaces)
523+
if parent is not None:
524+
if sibling is not None:
525+
parent.insert(parent.index(sibling) + 1, el)
526+
else:
527+
parent.append(el)
528+
529+
if text is not None:
519530
el.text = str(text)
520531
if attrib:
521532
for key, value in attrib.items():

bin/create_hugo_yaml.py

+3
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,15 @@ def export_anthology(anthology, outdir, clean=False, dryrun=False):
6767
if paper.parent_volume.ingest_date:
6868
data["ingest_date"] = paper.parent_volume.ingest_date
6969
data["title_html"] = paper.get_title("html")
70+
data["title_raw"] = paper.get_title("xml")
7071
if "xml_title" in data:
7172
del data["xml_title"]
7273
if "xml_booktitle" in data:
7374
data["booktitle_html"] = paper.get_booktitle("html")
7475
del data["xml_booktitle"]
7576
if "xml_abstract" in data:
7677
data["abstract_html"] = paper.get_abstract("html")
78+
data["abstract_raw"] = paper.get_abstract("xml")
7779
del data["xml_abstract"]
7880
if "xml_url" in data:
7981
del data["xml_url"]
@@ -138,6 +140,7 @@ def export_anthology(anthology, outdir, clean=False, dryrun=False):
138140
log.debug("export_anthology: processing volume '{}'".format(id_))
139141
data = volume.as_dict()
140142
data["title_html"] = volume.get_title("html")
143+
data["title_raw"] = volume.get_title("xml")
141144
del data["xml_booktitle"]
142145
if "xml_abstract" in data:
143146
del data["xml_abstract"]

0 commit comments

Comments
 (0)