Skip to content

Commit ef9f25e

Browse files
authored
Update author docs (#4602)
* Update corrections page * Update blog post * Remove rich from bin/requirements.txt (often leads to conflicts with the version-pegged requirement in python/)
1 parent 6e4c203 commit ef9f25e

File tree

4 files changed

+85
-44
lines changed

4 files changed

+85
-44
lines changed

bin/requirements.txt

-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ python-slugify>=2.0
2121
pytz
2222
PyYAML>=3.0
2323
requests
24-
rich
2524
ruff~=0.3.4
2625
setuptools==75.6.0
2726
stop-words==2018.7.23

hugo/content/faq/corrections.md

+9-3
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,14 @@ Title: How can I submit corrections to papers?
33
weight: 1
44
---
55

6-
Metadata (title, author list, and abstract) can be corrected by clicking on the yellow "Fix data" button on any paper page in the Anthology (e.g., [K17-1003](https://aclanthology.org/K17-1003/)). This will present you with a modal dialog that upon submission will instantiate an issue template in the Anthology Github repository.
6+
There are a number of types of corrections. For an overview, you can see our [our corrections page]({{< relref "/info/corrections.md" >}}).
77

8-
The Anthology treats PDFs as the authoritative source of information, so metadata corrections should conform to the information in the PDF.
8+
In summary, there are three main types of corrections we handle:
99

10-
For corrections to PDFs, please read the information on [our corrections page]({{< relref "/info/corrections.md" >}}).
10+
* _Metadata only_. This includes corrections to the title, author list, or abstract. These can be initiated by clicking on the yellow "Fix data" button on any paper page in the Anthology.
11+
* _Author disambiguation_. These corrections involve author pages (e.g., anything under https://aclanthology/people/), and can also be initiated by clicking on the yellow "Fix data" button on any author page. There are two main types of author disambiguation:
12+
1. _One author, 2+ names_. e.g., [Aravind Joshi](https://aclanthology.org/people/aravind-joshi) Aravind K. Joshi, etc. Note that often, an author will have multiple pages because the metadata for one or more PDFs is incorrect. The pages can be automatically merged if the underlying PDFs have the verbatim string and are simply correct in the metadata. If different names were used, we have to create an entry in our name variants file.
13+
2. _2+ authors, one name_. e.g., [Yang Liu](https://aclanthology.org/people/yang-liu). In this situation, we have to manually assign IDs to the papers to create a separate author page, typically using their Ph.D. granting institution (e.g., [Yang Liu of Edinburgh](https://aclanthology.org/people/y/yang-liu-edinburgh/).
14+
* _PDF corrections_. This includes changes such as revisions or retractions. For more information, please see see [details on the corrections page]({{< relref "/info/corrections.md" >}}).
15+
16+
Important: The Anthology treats the data on the PDF itself as the authoritative source of information, so all metadata corrections should conform to the information found there. If that information is wrong, a PDF correction is needed.

hugo/content/info/corrections.md

+66-40
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,73 @@
22
Title: Requesting Corrections
33
linktitle: Corrections
44
subtitle: How to submit corrections to the Anthology
5-
date: 2024-12-26
5+
date: 2025-02-10
66
---
77

8-
### Types of corrections
8+
### What type of correction do I need?
99

10-
The staff of the ACL Anthology can process requests for many types of corrections.
11-
We generally distinguish five types, loosely following the [ACM Publications Policy](https://www.acm.org/publications/policies/):
10+
Our central guiding corrections principle is that **we view the content of PDFs as authoritative**. If you see errors or inconsistencies in the metadata (author list, title, abstract), you need to first check to see if it matches the PDF.
11+
12+
This view drives three main types of corrections:
13+
14+
* [_PDF corrections_](#pdf-corrections). The PDF itself can be in error.
15+
* [_Metadata only_](#metadata-corrections). Information presented in the Anthology may be different from the PDF. Examples include errors in the title, abstract, or author list.
16+
* [_Author disambiguation_](#author-disambiguation). An author's papers might be spread across multiple author pages (one person, multiple pages), or a single author page might contain papers from different people (multiple people, one page).
17+
18+
Below we describe the process for addressing these types of corrections, in order of the frequency we encounter them.
19+
20+
### Metadata corrections
21+
22+
Corrections to **metadata** do not require changing the PDF.
23+
These kinds of corrections bring the information presented in the Anthology in line with the authoritative PDF.
24+
25+
A request to change paper metadata can be submitted in two ways.
26+
27+
1. Navigate to the paper's page in the ACL Anthology (e.g., [K17-1003](https://aclanthology.org/K17-1003/)). From there, click the yellow "Fix data" button. This will display a dialog that you can use to correct the title and abstract and fix issues with the author list.
28+
29+
Submitting this form will create a Github issue with a JSON data block. This will then be validated by Anthology staff, and processed by a semi-automatic bulk corrections script on a weekly basis.
30+
31+
2. If you would like to expedite the process and are familiar with [git](https://git-scm.com), you can make the correction yourself and file a [pull request (PR)](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests).
32+
33+
* First, locate your file amongst our [authoritative XML files](https://github.com/acl-org/acl-anthology/tree/master/data/xml). The name of your file is the portion of the Anthology ID that comes before the hyphen. As an example, if the Anthology ID of your paper is `P19-10171`, then the file you will need to edit is [data/xml/P19.xml](https://github.com/acl-org/acl-anthology/blob/master/data/xml/P19.xml); if the Anthology ID of your paper is `2021.iwslt-1.28`, then the file you will need to edit is [data/xml/2021.iwslt.xml](https://github.com/acl-org/acl-anthology/blob/master/data/xml/2021.iwslt.xml).
34+
* Find your entry in the XML file, and use Github's edit button to fix it and then to issue a PR against our `master` branch.
35+
* For larger XML files, you may have to fork the repository first. [More information can be found here](https://help.github.com/en/github/managing-files-in-a-repository/editing-files-in-another-users-repository).
36+
37+
The Anthology team will attend to the correction as we find time.
38+
Metadata changes are generally accepted if they are consistent with the PDF, which we take as authoritative.
39+
However, please see the following note.
40+
41+
**Note on changes to author metadata**
42+
43+
Because it is beyond our ability to keep track of the many differing policies governing conferences and journals whose proceedings we host, it is therefore up to those groups to ensure that PDF authorship is correct when proceedings are delivered to the Anthology for ingestion.
44+
45+
We reserve the right to seek permission or corroboration from the associated conference or workshop program chairs in unusual situations, such as removing or adding an author to a PDF revision.
46+
In such cases, we will ask authors to arrange for this permission to be conveyed to us, either (ideally) on the corresponding Github issue or via email.
47+
48+
### Author disambiguation
49+
50+
The Anthology builds author pages based on the string form of names found in paper metadata.
51+
These pages are housed under https://aclanthology.org/people/, e.g., [Aravand Joshi](https://aclanthology.org/people/aravind-joshi).
52+
53+
There are two types of author disambiguation:
54+
55+
**One person, multiple author pages**.
56+
This situation occurs when a person has multiple papers written under different names.
57+
Often, these names are minor variations of each other (e.g., including or excluding a middle initial).
58+
Sometimes, this is a simple mistake in the metadata that can be handled using the procedure described above.
59+
However, if the metadata for the papers is correct, then we need to manually link the author pages.
60+
61+
**Multiple people, single author page**.
62+
In this situation, many different people have published under the same name.
63+
An example is [Yang Liu](https://aclanthology.org/people/yang-liu).
64+
In this case, we have to manually assign IDs to the papers to create a separate author page, typically using their Ph.D. granting institution (e.g., [Yang Liu of Edinburgh](https://aclanthology.org/people/y/yang-liu-edinburgh/).
65+
66+
Both situations can be addressed by [filing an Author Page request](https://github.com/acl-org/acl-anthology/issues/new?template=02-name-correction.yml).
67+
68+
### PDF corrections
69+
70+
Our PDF corrections process loosely follows the [ACM Publications Policy](https://www.acm.org/publications/policies/):
1271

13-
* Corrections to **metadata** do not require changing the PDF.
14-
Examples include correcting the spelling of a name or the title.
15-
These kinds of corrections are typically made to bring the metadata in line with what is on the PDF, which is taken to be authoritative.
16-
If changes to the metadata also require a change to the PDF (e.g., changing an author's name), a revision must also be supplied.
1772
* An **erratum** clarifies errors made in the original scholarly work.
1873
Usually these are just short notes, corrective statements, or changes to equations or other problems in the original, which need to be read alongside the original work.
1974
* A **revision** is a versioned replacement of the original scholarly work.
@@ -29,28 +84,8 @@ Please take note of the following points regarding revisions and retractions.
2984
* We cannot currently regenerate the full volumes, which will continue to contain only the original papers.
3085
* We have no control over how downstream consumers of the Anthology, such as search engine, process the changes.
3186

32-
### Correcting Metadata
33-
34-
Please note that **the Anthology treats PDFs as authoritative**. This means that all metadata corrections must be consistent with the PDF.
35-
36-
A request to change paper metadata can be submitted in two ways.
37-
38-
- Navigate to the paper's page in the ACL Anthology (e.g., [K17-1003](https://aclanthology.org/K17-1003/)). From there, click the yellow "Fix data" button. This will display a dialog that you can use to correct the title and abstract and fix issues with the author list.
39-
40-
Submitting this form will create a Github issue with a JSON data block. This will then be validated by Anthology staff, and processed by a semi-automatic bulk corrections script on a weekly basis.
4187

42-
- If you would like to expedite the process and are familiar with [git](https://git-scm.com), you can make the correction yourself and file a [pull request (PR)](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests).
43-
1. First, locate your file amongst our [authoritative XML files](https://github.com/acl-org/acl-anthology/tree/master/data/xml). The name of your file is the portion of the Anthology ID that comes before the hyphen.
44-
45-
As an example, if the Anthology ID of your paper is `P19-10171`, then the file you will need to edit is [data/xml/P19.xml](https://github.com/acl-org/acl-anthology/blob/master/data/xml/P19.xml); if the Anthology ID of your paper is `2021.iwslt-1.28`, then the file you will need to edit is [data/xml/2021.iwslt.xml](https://github.com/acl-org/acl-anthology/blob/master/data/xml/2021.iwslt.xml).
46-
2. Find your entry in the XML file, and use Github's edit button to fix it and then to issue a PR against our `master` branch.
47-
3. For larger XML files, you may have to fork the repository first. [More information can be found here](https://help.github.com/en/github/managing-files-in-a-repository/editing-files-in-another-users-repository).
48-
49-
The Anthology team will attend to the correction as we find time.
50-
Metadata changes are generally accepted if they are consistent with the PDF, which we take as authoritative.
51-
However, please see the [note below about changes to the author list](#note-on-author-changes).
52-
53-
### Revisions and errata
88+
#### Revisions and errata
5489

5590
For requests to change paper *content* (either a revision or an erratum), again, please [file a Github issue](https://github.com/acl-org/acl-anthology/issues/new?assignees=anthology-assist&labels=correction%2Crevision&template=03-revision-or-errata.yml&title=Paper+Revision%7Breplace+with+Anthology+ID%7D).
5691
**Please note the following**:
@@ -76,7 +111,7 @@ Submissions not meeting these standards will be rejected, potentially without no
76111

77112
A revision that changes the author list needs permission (see below).
78113

79-
### Retractions
114+
#### Retractions
80115

81116
To initiate a retraction, please communicate directly with the Anthology director.
82117
Retractions often involve the organizing editors or chairs of the respective journal or conference.
@@ -91,7 +126,7 @@ Retractions result in the following changes in the Anthology:
91126
No bibliographic files are generated, and the paper is not listed in the consolidated Anthology BibTeX file.
92127
* The paper is removed entirely from the listing on the author page.
93128

94-
### Removal
129+
#### Removal
95130

96131
Removals are rare events that are undertaken only in the most serious of situations, such as plagiarism or fraud.
97132
A paper can be removed at the request of the scientific organization with jurisdiction over the paper.
@@ -108,12 +143,3 @@ A removal will result in the following changes to the Anthology:
108143
Its title and author list will be presented in ~~strikeout text~~.
109144
The abstract, if present, will be removed.
110145
No bibliographic files will be generated.
111-
112-
### Note on changes to author metadata
113-
114-
The Anthology generally accepts corrections to the author metadata that bring it into line with the PDF, which we treat as authoritative.
115-
Example corrections include name spellings and details (such as initialization or the inclusion of a middle name), changes to author ordering, and even the addition of authors mistakenly left out of the metadata.
116-
Because it is beyond our ability to keep track of the many differing policies governing conferences and journals whose proceedings we host, it is therefore up to those groups to ensure that PDF authorship is correct when proceedings are delivered to the Anthology for ingestion.
117-
118-
We reserve the right to seek permission or corroboration from the associated conference or workshop program chairs in unusual situations, such as removing or adding an author to a PDF revision.
119-
In such cases, we will ask authors to arrange for this permission to be conveyed to us, either via email or on a Github issue.

hugo/content/posts/2024-12-27-new-metadata-workflow.md

+10
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,26 @@ date: "2024-12-27"
44
description: >
55
A simplified workflow for processing metadata corrections should make it easier for authors to submit corrections and for Anthology staff and volunteers to process them expeditiously.
66
---
7+
### A long-standing problem
8+
79
After proceedings are published, authors often discover errors in the paper metadata, including misspelled author names, missing or misordered authors, and mistakes in titles or abstracts. Until today, corrections to this data involved a lot of manual effort on both the part of authors (who had to locate the form and fill it out) and Anthology staff (who had to manually process and interpret these forms and make the corrections). The result was hundreds of issues accumulating on the [Anthology Github repository](https://github.com/acl-org/acl-anthology/issues?q=is%3Aissue+is%3Aopen+label%3Acorrection+label%3Ametadata), and a delay of weeks or sometimes even months to process them.
810

911
<img src="/images/2024-12-27/many-requests.png" alt="Accumulating issues" style="width:50%;" />
1012

13+
### Our simplified solution
14+
1115
We are therefore happy to announce a new simplified workflow that we hope will reduce effort and processing time. This workflow introduces a yellow "Fix data" button on each paper page in the Anthology. Clicking on this button will display a dialog allowing for easy manipulation of the title, author list, and abstract. Upon submission, this dialog will lead the submitter to the creation of a structured Github issue for the correction. The user needs only to submit the issue, and leave the rest to Anthology staff.
1216

1317
<img src="/images/2024-12-27/dialog.png" alt="Metadata correction dialog" style="width:50%;" />
1418

1519
For the curious, from that point, we make use of further Github automations to make the process as easy as possible. A Github workflow annotates the issue with an image of the paper and a link to its paper page, allowing for easy visual verification of the corrections. We then run a script that can automatically create a consolidated pull request from all approved correction requests.
1620

21+
### Our workflow
22+
23+
Anthology volunteers continually process these corrections. If further information is required, we will ask you for it on the Github issue. Once everything looks correct, we will add an "approved" label. We then process approved corrections in bulk, aiming for a weekly cadence.
24+
25+
### In closing
26+
1727
We hope that this simplified process will make the submission process easier and more intuitive for authors who submit corrections, and also that it will enable us to process them much more frequently than the monthly process we've been using up till this point.
1828

1929
We are excited to see how this new process will work in practice, and we welcome feedback from the community on how we can further improve it.

0 commit comments

Comments
 (0)