aclpub2 supports the generation of Proceedings and Booklets for *CL Conferences (ACL, NAACL, EMNLP, ... ) and related Workshops. This README has been created to provide the instructions to follow to generate proceedings/booklets in aclpub2 format.
The provided Python tool to generate the proceedings takes as input a set of files containing all information on the event (in the .yml
format) and generates a .tex
file containing the conference details, sponsors, prefaces, organizing and program committees, as well as the concatenation of all the watermarked accepted papers and the author index. Such .tex
file is then compiled to generate the pdf
file of the proceedings.
- OpenReview. This guide is for you, we will explain you how to use the provided tool to generate the proceedings (and the handbook) automatically from OpenReview.
- EasyChair (or any other reviewing platform). This guide is for you, we will explain you how to generate the proceedings starting from manually edited files.
- SoftConf. You can either follow the istructions provided by this guide to generate the proceedings starting from manually edited files (this is the suggested option for small-medium size events) or follow the ACLPUB instructions. In case you choose to use aclpub2 from information uploaded to softconf, we have recently added scripts to automate the export of info about the papers and program committee from that platform in a format compatible with aclpub2. The process is not fully automated but can be useful to simplify the process. Please take a look at the folder
softconf
.
The scripts to generate the proceedings accept as input a set of .yml
files and directories. A YML file is a text document that contains data formatted using YAML (YAML Ain't Markup Language), a human-readable data format used for data serialization. You can open a YML file in any text editor (or source code editor).
Examples and usage of YAML syntax can be found here.
The following .yml
files should be provided to the generation scripts. Files 1, 2, 3, 4 and 6 should be manually edited with information concerning your conference/workshops, while files 5 and 7 can be automatically exported from OpenReview (or manually edited if you are not using OpenReview).
conference_details.yml
sponsors.yml
(optional)prefaces.yml
organizing_committee.yml
program_committee.yml
invited_talks.yml
(optional)papers.yml
We strongly suggest taking a look at this link, where you can find examples of all the above files initialized for a past conference.
In addition, for the handbook, a file program.yml
should be created Jump to Handbook generation instructions.
The generated proceedings should be sent to the publication chairs as a .zip
or .tgz
file containing a folder named with the conference/workshop acronym.
The build process creates a directory called output
. This directory should contain all of the files that the publication chairs need, but it is always a good idea to confirm that this directory contains all of the files described below.
If you are interested in an example of an output folder, just run the software on the test case, as discussed here.
In a nutshell, such folder should contain:
- A PDF file named
proceedings.pdf
containing the whole conference/workshop proceedings (i.e., the introduction and all the watermarked PDFs of the camera ready papers). - A folder named
watermarked_pdf
containing all the pdfs of the watermarked camera ready papers.
- Important: this folder MUST contain the special file named
0.pdf
that only contains the initial part of the proceedings (from the cover to the table of contents). The software automatically add it, but please check it, otherwise the Proceedings cannot be added to the ACL Anthology.
- A folder name
attachments
containinng all files attached to the indivual papers during their submission (e.g., the code attached to a paper). Notice that each attachment myst be correctly referred in thepapers.yml
file with respect to the base folder namedattachments
. Only in case no paper has an attachment, this folder can be omitted. - A folder named
inputs
containing all the input files used to generate the proceedings. In particular, this folder must contain the inputyml
andtex
files used. You can also an the not watermarked pdfs in the subfolderinputs/papers
. Plase avoid to add here the attachments of the individual papers (e.g., the code or software). They must be collected in theattachments
folder described below. This folder is automatically built from the software and copied in the output folder, but please remember to check it.
Upload the resulting file (ACRONYM_data.tgz
) to a file server or cloud storage (e.g., Google Drive) and email the link to it to the ACL publication chairs, who will assemble them for delivery to the Anthology. Please do not send the file as an email attachment.
REALLY IMPORTANT: Before generating the final proceedings, please carefully check the input pdfs of the camera ready papers with the ACLPUBCHECK tool, a Python tool that automatically detects author formatting errors, margin violations as well as many other common formatting errors in papers that are using the LaTeX sty file associated with ACL venues. The tool and instructions to use it can be found here. We strongly suggest to share with the authors this tool before the sumbission of their final camera ready, in order to reduce the effort of controlling possibly hundreds of papers.
Below you can find instructions (and examples) on how you should edit the .yml
files with information on your conference/workshop.
This file should contain the key information about the conference, as its name, abbreviation and so on. It is used to build the cover of the proceedings, watermarks, and other items.
Note that the ISBN of your conference/workshop will be provided by ACL.
book_title: name of the book; it should be in the form "Proceedings of ..." and it will be used in the bib file to name the event and to watermark the individual papers
event_name: name of the Conference or Workshop and it will be used in the frontmatter of the proceedeings
cover_subtitle: the subtitle used in the cover of the proceedings, it can be in the form "Proceedings of the Conference, Vol. 1 (Long Papers)" or "Proceedings of the Workshop"
anthology_venue_id: conference/workshop abbreviation or acronym, e.g. EMNLP
start_date: Conference start date YYYY-MM-dd
end_date: Conference end date YYYY-MM-dd
isbn: ISBN number of the proceeding (assigned by the ACL)
location: location of the conference
editors: list of the editors of the volume, in the form
- first_name: name of the editor (e.g., John)
middle_name: middle nanme of the editor (e.g., D.)
last_name: surname of the editor (e.g., Walker)
publisher: published of the conference, generally "Association for Computational Linguistics"
volume_name: a tag used by the ACL Anthology to characterize the new volume in a group of proceedings. For a volume of the main conference, it should be a tag from the list long|short|srw|demo|findings. For other volumes, such as workshops, it should be set to 1
watermark_book_title: [optional] If you do not want to use the text in the book_title as a watermark, you can specify here the alternative form. It is particularly usefull when the book_title is too long: in this case you can copy that text in this field and use the line break symbol \\ and, if the text is enclosed between " ", use \\\\
Notice: avoid using LaTeX escape codes but simply use the characters in UTF8, e.g., Rilić instead of Rili'\{c})).
Here some example, first for a conference:
book_title: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Long Papers)
event_name: The 60th Annual Meeting of the Association for Computational Linguistics
cover_subtitle: Proceedings of the Conference (Long Papers)
anthology_venue_id: ACL
start_date: 2022-05-22
end_date: 2022-05-27
isbn: XXX-X-XXXXXX-XX-X (you should replace this with the real ISBN)
location: Dublin, Ireland
editors:
- first_name: Smaranda
last_name: Muresan
- first_name: Preslav
last_name: Nakov
- first_name: Aline
last_name: Villavicencio
publisher: Association for Computational Linguistics
volume_name: long
watermark_book_title: "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics\\\\Volume 1: Long Papers"
and for a workshop
book_title: Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval 2021)
event_name: The 2nd Workshop on Human Evaluation of NLP Systems
cover_subtitle: Proceedings of the Workshop
anthology_venue_id: HumEval
start_date: 2022-05-27
end_date: 2022-05-27
isbn: XXX-X-XXXXXX-XX-X (you should replace this with the real ISBN)
location: Dublin, Ireland
editors:
- first_name: Belz
last_name: Anya
- first_name: Popović
last_name: Maja
- first_name: Reiter
last_name: Ehud
- first_name: Shimorina
last_name: Anastasia
publisher: Association for Computational Linguistics
volume_name: 1
watermark_book_title: Proceedings of the 2nd Workshop on Human Evaluation of NLP Systems (HumEval 2021)
This file should list the sponsors (if any). A directory containing the related logos should be created in the same directory of the .yml
files (named sponsor_logos/).
- tier: Name of the tier, e.g. Diamond Level or In Collaboration With
logos:
- Path to a logo file relative to the sponsor_logos/ directory, e.g. facebook.png
This file should list the prefaces that will be included in the proceedings. A directory containing the .tex
files that provide the text of the prefaces should be created in the same directory of the .yml
files (named prefaces/).
- title: Title of the preface, e.g. "Preface by the General Chair"
file: Name of the file relative to the prefaces/ directory containing the preface text, e.g. general_chair.tex
The contents of the .tex
files should not include usual headers and footers found within LaTeX files.
Instead, they should only contain the contents between the \begin{document}
and \end{document}
directives.
Frequently, this will simply be plaintext, with a few formulas, figures, or tables.
This file should list the members of the organizaing committee. You should edit this file manually.
- role: Name of role, e.g. General Chair
members:
- first_name: Committee member first name
middle_name: Committee member middle names
last_name: Committee member last name
institution: Committee member's institution name as it should appear, e.g. University of California Berkeley, USA
This file should list the members of the program committee. You can edit this file manually, or export it from OpenReview Jump to How to export yml files from OpenReview.
- role: Name of role, e.g. General Chair
members:
- first_name: Committee member first name
middle_name: Committee member middle names
last_name: Committee member last name
institution: Committee member's institution name as it should appear, e.g. University of California Berkeley, USA
- role: Reviewers
type: name_block # By adding the name_block type in the role, names will be output in alphabetized blocks.
entries:
- Committee Member Name
This optional file should list the invited talks and associated abstracts and bios. A directory containing the .tex
files that provide the text of the abstract and the bios should be created in the same directory of the .yml
files (named invited_talks/).
As with the prefaces, the contents of the .tex
files should not include usual headers and footers found within LaTeX files,
and only what is usually found between the \begin{document}
and \end{document}
directives.
- speaker_name: "Speaker name as it should appear, e.g., Jane Doe"
institution: "Speaker's institution name as it should appear, e.g., University of California Berkeley, USA"
title: "The title of the talk."
abstract_file: "Path to the abstract's LaTeX file relative to the invited_talks/ directory, e.g., invited_talks/jane_doe_abstract.tex"
bio_file: "Path to the bio's LaTeX file relative to the invited_talks/ directory e.g., invited_talks/jane_doe_bio.tex"
photo: "Path to the speaker's photo, relative to the invited_talks/ directory e.g., invited_talks/jane_doe_photo.jpg"
date: "Day of the invited talk, e.g., Mon, March 18, 2024"
time: "Time of the invited talk, e.g., 09:00 -- 10:00"
location: "Location of the invited talk, e.g., Room A"
custom_prefix: "Custom title for the page, e.g., Distinguished Lecture. This field allows customizing the default title of the page. If not provided, 'Keynote' is used."
This file should list the accepted papers, along with a directory (named papers/) containing the associated PDFs.
Each of the listed papers must have a unique ID so that they may be referred to by ID within the program.yml
file later on. You can edit this file manually, or export it from OpenReview Jump to How to export yml files from OpenReview.
- id: Unique ID for the paper.
authors: # List of authors, structure detailed below.
- first_name: First name e.g. Jane
middle_name: (opt) Middle name e.g. Emily
last_name: Last name e.g. Doe
preferred_name: (opt) Prefered name, if not the same as first_name.
institution: Name of the author's institution.
email: Author's email.
openreview: (opt) Author's OpenReview username.
google_scholar: (opt) Author's Google Scholar ID.
orcid: (opt) Author's ORCID ID.
dblp: (opt) Author's DBLP ID.
semantic_scholar: (opt) Author's Semantic Scholar ID.
attributes:
# Key-value pairs used to manage other aspects of
# the publication process. Below are examples of possible
# attributes. These attributes are not shown in the proceedings ...
# but these are really useful in other steps, e.g., in the
# definition of the program.
paper_type: long | short
presentation_type: oral | poster
submitted_area: Semantics | Machine Learning | ...
file: File name relative to the papers/ directory, e.g. 1.pdf
attachments:
# A list of additional files associated with the paper.
# The type, along with one of file must be specified.
- type: dataset | note | poster | presentation | software | attachment
file: Local file path, e.g. 5.zip
title: Title of the paper.
abstract: Abstract of the paper, usually a LaTeX fragment.
archival: Whether or not the paper is archival. Default is True, set to false to
exclude a paper from the proceedings.
Please notice that in the field title
in the attachments
group it is not possible to use external urls, but only files added in the attachment folder can be referred with the relative path.
When running your workshop on OpenReview, it is possible to use their API for automatically extracting the papers.yml
and program_committee.yml
files. For this purpose, in the folder openreview
we provide two Python3 scripts for facilitating your work.
-
or2papers.py
: it creates thepapers.yml
file by extracting the papers marked as "accepted" as "Decision"; -
or2program_committee.py
: it creates theprogram_committee.yml
file by retrieving the Senior Area Chairs list registered at workshop spate on OpenReview and the list of reviewers;
Those scripts are designed to be used by the workshop's Program Chairs due to access permission required during the queries to OpenReview. To use these scripts, you will need username (the e-mails used for login onto OpenReview), password (the password associated with the user's account), and the workshop_ID (the OpenReview identifier linked to the workshop).
Workshop ID: you can find out the workshop's identifier by following one of the two approaches below:
-
Workshop ID is identified as "venue ID" on the setup website.
-
Workshop ID is present at the workshop's URL. It is the ID field. For example, the ID of the ACL conference (https://openreview.net/group?id=aclweb.org/ACL/2022/Conference) is "aclweb.org/ACL/2022/Conference". Note that & is a separator in the URL. Therefore anything after it is not part of the workshop ID.
Those scripts require Python3 and OpenReview API installed on your machine. For installing OpenReview API, please go to https://openreview-py.readthedocs.io/en/latest/how_to_setup.html
The scripts based on OpenReview API retrieve all information directly from OpenReview. In other words, all SACs, reviewers and authors must have their OpenReview profiles updated (mainly name and affiliation).
This script will find the intersection of all blind submissions and the submissions with a decision set as accepted. Those papers' information will be stored in the paper.yml
file and downloaded at the "papers" and "attachments" folders. The download includes the PDF and additional attachments provided during the submission. Note that papers are randomly sorted, and different runs of the or2papers.py
will return the papers sorted differently.
For running or2papers.py
type:
python or2papers.py USER PASSWORD WORKSHOP_ID
For example:
python or2papers.py [email protected] 123456 aclweb.org/ACL/2022/Conference
This script searches all Senior_Area_Chairs and Program_Chairs under your conference and saves their information in the program_committee.yml
file.
For running or2papers.py
type:
python or2program_committee.py USER PASSWORD WORKSHOP_ID
For example:
python or2program_committee.py [email protected] 123456 aclweb.org/ACL/2022/Conference
-
The workshops that accepts the ARR commitment should be aware that the
or2program_committee.py
script only extracts data of submitted/committed papers. -
During the script execution, you may see a message such as "ERROR: or_id not found". It means that the script could not retrieve the profile's information from OpenReview. Therefore, you must insert manually the data in the
paper.yml
orprogram_committee.yml
. You can identify the problematic OpenReview ID and their papers inpaper.log
andprogram_committee.log
Now that you know the expected structure of the proceedings and you know how to edit/export the required .yml
input files, you are ready to test the tool to automatically generate the proceedings. First of all, follow the Setup procedures.
Then, as a training example, we made at your disposal in the examples/sigdial repository all the files you would need to correctly generate the proceedings.
Could you compile the sigdial proceedings? 🎊
Excellent, you are now ready to run the generation scripts on the files you have just edited/exported for your conference/workshop.
python -m pip install -r requirements.txt
Java is required to use the pax latex library, which is responsible for extracting and reinserting PDF links. Visit the Java website for instructions on how to install.
sudo apt-get install texlive-latex-base texlive-latex-recommended texlive-latex-extra texlive-fonts-recommended texlive-fonts-extra texlive-bibtex-extra texlive-lang-all
Install mactex
.
One way this is to install Homebrew first and then:
brew install mactex
Ensure that PYTHONPATH
includes .
, for example export PYTHONPATH=.:$PYTHONPATH
.
Run the CLI on the SIGDIAL example directory:
./bin/generate examples/sigdial --proceedings
The generated results, along with intermediate files and links, can then be found in
the output
directory in the directory in which you ran the command.
As said before, the generation scripts accepts as input the path to a directory, containing a set of .yml
files and directories.
This expected input directory structure and the CLI are detailed below.
# Generates the proceedings.
./bin/generate examples/sigdial --proceedings
# Generates the handbook.
./bin/generate examples/sigdial --handbook
# Generates both.
./bin/generate examples/sigdial --proceedings --handbook
# Generates both and overwrites the existing contents of the build directory.
./bin/generate examples/sigdial --proceedings --handbook --overwrite
Users may wish to make modifications to the output .tex
files.
Though we recommend first copying the .tex
files to a new working directory,
the --overwrite
flag helps ensure that local modifications are not accidentally erased.
The above describe a reasonable default usage of this package, but the behavior can easily be extended or modified by adjusting the contents of the aclpub2/
directory.
The main files to keep in mind are aclpub2/templates/proceedings.tex
which contains the core Jinja template file, and aclpub2/generate.py
which is responsible for rendering the template.
The input templates use the T1 font encoding. If you are interested in different encodings (e.g., Vietnamese) you have to modify the aclpub2/templates/proceedings.tex
by changing the statement \usepackage[T1]{fontenc}
and specifying a different encoding, e.g., \usepackage[T5]{fontenc}
.
This project makes extensive use of Jinja to produce readable Latex templates. Before contributing or forking, it is generally helpful to familiarize yourself with the Jinja library. Documentation can be found here.
Additional configuration for Jinja can be found in the aclpub2/templates.py
file.
The purpose of this file are to set up the Jinja environment with LaTeX-like block delimiters so that the proceedings.tex
file can be syntax highlighted and otherwise interacted with in a fashion that is more natural for LaTeX users.
In addition, it is also responsible for configuring some convenience functions that allow us to create some LaTeX structures in the final output .tex
file that are easier to write in native Python than either the Jinja base syntax, or LaTeX alone.
** Work in progress **
Describes the conference program. This file is organized in blocks, each with a title, start, and end time, followed by a list of papers IDs. Instead of defining presentations, sessions may define subsessions, which have the same structure as the top-level session.
- title: Title of the conference session, e.g. Opening Remarks
start_time: Start time of the session as an ISO datestring.
end_time: End time of the session as an ISO datestring.
location: Location that the session is taking place in, e.g. Main Hall or Online
chair: (opt) Name of the chair of the session, e.g. Jane Doe.
url: (opt) URL to join or view the session, if applicable.
papers:
- id: Paper ID
start_time: Optional start time of the paper slot as an ISO datestring.
end_time: Optional start time of the paper slot as an ISO datestring.
# Or, if this is a session that is broken into subsessions:
- title: Title of the conference session, e.g. Opening Remarks
start_time: Start time of the session as an ISO datestring.
end_time: End time of the session as an ISO datestring.
subsessions:
- title: Title of the conference session, e.g. Opening Remarks
start_time: Start time of the session as an ISO datestring.
end_time: End time of the session as an ISO datestring.
chair: (opt) Name of the chair of the session, e.g. Jane Doe.
location: Location that the session is taking place in.
papers:
- id: Paper ID
start_time: Optional start time of the paper slot as an ISO datestring.
end_time: Optional start time of the paper slot as an ISO datestring.