Added missing ACL 2020 attachments (acl-org#969)

namrathaurs · Aug 24, 2020 · ac5c8b9 · ac5c8b9
1 parent 7801023
commit ac5c8b9
Show file tree

Hide file tree

Showing 9 changed files with 61 additions and 0 deletions.
diff --git a/data/xml/2020.acl.xml b/data/xml/2020.acl.xml
diff --git a/data/xml/2020.bea.xml b/data/xml/2020.bea.xml
@@ -87,6 +87,7 @@
       <abstract>In this paper we employ a novel approach to advancing our understanding of the development of writing in English and German children across school grades using classification tasks. The data used come from two recently compiled corpora: The English data come from the the GiC corpus (983 school children in second-, sixth-, ninth- and eleventh-grade) and the German data are from the FD-LEX corpus (930 school children in fifth- and ninth-grade). The key to this paper is the combined use of what we refer to as ‘complexity contours’, i.e. series of measurements that capture the progression of linguistic complexity within a text, and Recurrent Neural Network (RNN) classifiers that adequately capture the sequential information in those contours. Our experiments demonstrate that RNN classifiers trained on complexity contours achieve higher classification accuracy than one trained on text-average complexity scores. In a second step, we determine the relative importance of the features from four distinct categories through a Sensitivity-Based Pruning approach.</abstract>
       <url hash="b544b2f5">2020.bea-1.6</url>
       <doi>10.18653/v1/2020.bea-1.6</doi>
+      <attachment type="Dataset" hash="6275ac98">2020.bea-1.6.Dataset.pdf</attachment>
     </paper>
     <paper id="7">
       <title>Annotation and Classification of Evidence and Reasoning Revisions in Argumentative Writing</title>
@@ -205,6 +206,7 @@
       <abstract>Complex Word Identification (CWI) is a task for the identification of words that are challenging for second-language learners to read. Even though the use of neural classifiers is now common in CWI, the interpretation of their parameters remains difficult. This paper analyzes neural CWI classifiers and shows that some of their parameters can be interpreted as vocabulary size. We present a novel formalization of vocabulary size measurement methods that are practiced in the applied linguistics field as a kind of neural classifier. We also contribute to building a novel dataset for validating vocabulary testing and readability via crowdsourcing.</abstract>
       <url hash="f7390621">2020.bea-1.17</url>
       <doi>10.18653/v1/2020.bea-1.17</doi>
+      <attachment type="Dataset" hash="de00ecff">2020.bea-1.17.Dataset.zip</attachment>
     </paper>
     <paper id="18">
       <title>Automated Scoring of Clinical Expressive Language Evaluation Tasks</title>

diff --git a/data/xml/2020.bionlp.xml b/data/xml/2020.bionlp.xml
@@ -130,6 +130,7 @@
       <abstract>Text classification tasks which aim at harvesting and/or organizing information from electronic health records are pivotal to support clinical and translational research. However these present specific challenges compared to other classification tasks, notably due to the particular nature of the medical lexicon and language used in clinical records. Recent advances in embedding methods have shown promising results for several clinical tasks, yet there is no exhaustive comparison of such approaches with other commonly used word representations and classification models. In this work, we analyse the impact of various word representations, text pre-processing and classification algorithms on the performance of four different text classification tasks. The results show that traditional approaches, when tailored to the specific language and structure of the text inherent to the classification task, can achieve or exceed the performance of more recent ones based on contextual embeddings such as BERT.</abstract>
       <url hash="a505a102">2020.bionlp-1.9</url>
       <doi>10.18653/v1/2020.bionlp-1.9</doi>
+      <attachment type="Dataset" hash="47149249">2020.bionlp-1.9.Dataset.pdf</attachment>
     </paper>
     <paper id="10">
       <title>Noise Pollution in Hospital Readmission Prediction: Long Document Classification with Reinforcement Learning</title>

diff --git a/data/xml/2020.figlang.xml b/data/xml/2020.figlang.xml
@@ -261,6 +261,7 @@
       <url hash="10dcef0a">2020.figlang-1.23</url>
       <attachment type="Software" hash="3084e139">2020.figlang-1.23.Software.zip</attachment>
       <doi>10.18653/v1/2020.figlang-1.23</doi>
+      <attachment type="Dataset" hash="5d983303">2020.figlang-1.23.Dataset.pdf</attachment>
     </paper>
     <paper id="24">
       <title><fixed-case>O</fixed-case>xymorons: a preliminary corpus investigation</title>

diff --git a/data/xml/2020.iwpt.xml b/data/xml/2020.iwpt.xml
@@ -96,6 +96,7 @@
       <abstract>Semiring parsing is an elegant framework for describing parsers by using semiring weighted logic programs. In this paper we present a generalization of this concept: latent-variable semiring parsing. With our framework, any semiring weighted logic program can be latentified by transforming weights from scalar values of a semiring to rank-n arrays, or tensors, of semiring values, allowing the modelling of latent-variable models within the semiring parsing framework. Semiring is too strong a notion when dealing with tensors, and we have to resort to a weaker structure: a partial semiring. We prove that this generalization preserves all the desired properties of the original semiring framework while strictly increasing its expressiveness.</abstract>
       <url hash="36981a2b">2020.iwpt-1.8</url>
       <doi>10.18653/v1/2020.iwpt-1.8</doi>
+      <attachment type="Dataset" hash="322b1060">2020.iwpt-1.8.Dataset.pdf</attachment>
     </paper>
     <paper id="9">
       <title>Advances in Using Grammars with Latent Annotations for Discontinuous Parsing</title>

diff --git a/data/xml/2020.ngt.xml b/data/xml/2020.ngt.xml
@@ -35,6 +35,7 @@
       <abstract>We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the three shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document-level generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language and 3) STAPLE task: creation of as many possible translations of a given input text. This last shared task was organised by Duolingo.</abstract>
       <url hash="c1b45c54">2020.ngt-1.1</url>
       <doi>10.18653/v1/2020.ngt-1.1</doi>
+      <attachment type="Dataset" hash="bf7e633a">2020.ngt-1.1.Dataset.txt</attachment>
     </paper>
     <paper id="2">
       <title>Learning to Generate Multiple Style Transfer Outputs for an Input Sentence</title>
@@ -296,6 +297,7 @@
       <abstract>We participated in all tracks of the Workshop on Neural Generation and Translation 2020 Efficiency Shared Task: single-core CPU, multi-core CPU, and GPU. At the model level, we use teacher-student training with a variety of student sizes, tie embeddings and sometimes layers, use the Simpler Simple Recurrent Unit, and introduce head pruning. On GPUs, we used 16-bit floating-point tensor cores. On CPUs, we customized 8-bit quantization and multiple processes with affinity for the multi-core setting. To reduce model size, we experimented with 4-bit log quantization but use floats at runtime. In the shared task, most of our submissions were Pareto optimal with respect the trade-off between time and quality.</abstract>
       <url hash="b267cea9">2020.ngt-1.26</url>
       <doi>10.18653/v1/2020.ngt-1.26</doi>
+      <attachment type="Dataset" hash="ffd898b7">2020.ngt-1.26.Dataset.txt</attachment>
     </paper>
     <paper id="27">
       <title>Improving Document-Level Neural Machine Translation with Domain Adaptation</title>

diff --git a/data/xml/2020.nlp4convai.xml b/data/xml/2020.nlp4convai.xml
@@ -74,6 +74,7 @@
       <abstract>Building conversational systems in new domains and with added functionality requires resource-efficient models that work under low-data regimes (i.e., in few-shot setups). Motivated by these requirements, we introduce intent detection methods backed by pretrained dual sentence encoders such as USE and ConveRT. We demonstrate the usefulness and wide applicability of the proposed intent detectors, showing that: 1) they outperform intent detectors based on fine-tuning the full BERT-Large model or using BERT as a fixed black-box encoder on three diverse intent detection data sets; 2) the gains are especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated examples per intent); 3) our intent detectors can be trained in a matter of minutes on a single CPU; and 4) they are stable across different hyperparameter settings. In hope of facilitating and democratizing research focused on intention detection, we release our code, as well as a new challenging single-domain intent detection dataset comprising 13,083 annotated examples over 77 intents.</abstract>
       <url hash="c3636fef">2020.nlp4convai-1.5</url>
       <doi>10.18653/v1/2020.nlp4convai-1.5</doi>
+      <attachment type="Dataset" hash="0666d8ca">2020.nlp4convai-1.5.Dataset.zip</attachment>
     </paper>
     <paper id="6">
       <title>Accelerating Natural Language Understanding in Task-Oriented Dialog</title>
@@ -104,6 +105,7 @@
       <abstract>Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users’ audio signals to text data through automatic speech recognition (ASR) and feed the text to downstream dialog models for natural language understanding and response generation. The ASR output is error-prone; however, the downstream dialog models are often trained on error-free text data, making them sensitive to ASR errors during inference time. To bridge the gap and make dialog models more robust to ASR errors, we leverage an ASR error simulator to inject noise into the error-free text data, and subsequently train the dialog models with the augmented data. Compared to other approaches for handling ASR errors, such as using ASR lattice or end-to-end methods, our data augmentation approach does not require any modification to the ASR or downstream dialog models; our approach also does not introduce any additional latency during inference time. We perform extensive experiments on benchmark data and show that our approach improves the performance of downstream dialog models in the presence of ASR errors, and it is particularly effective in the low-resource situations where there are constraints on model size or the training data is scarce.</abstract>
       <url hash="e4c00489">2020.nlp4convai-1.8</url>
       <doi>10.18653/v1/2020.nlp4convai-1.8</doi>
+      <attachment type="Dataset" hash="aa8868d4">2020.nlp4convai-1.8.Dataset.zip</attachment>
     </paper>
     <paper id="9">
       <title>Automating Template Creation for Ranking-Based Dialogue Models</title>
@@ -117,6 +119,7 @@
       <url hash="e2fc267c">2020.nlp4convai-1.9</url>
       <attachment type="Software" hash="d3cfb0f6">2020.nlp4convai-1.9.Software.txt</attachment>
       <doi>10.18653/v1/2020.nlp4convai-1.9</doi>
+      <attachment type="Software" hash="b5003fb1">2020.nlp4convai-1.9.Software.zip</attachment>
     </paper>
     <paper id="10">
       <title>From Machine Reading Comprehension to Dialogue State Tracking: Bridging the Gap</title>

diff --git a/data/xml/2020.nuse.xml b/data/xml/2020.nuse.xml
@@ -88,6 +88,7 @@
       <url hash="c659338e">2020.nuse-1.6</url>
       <attachment type="Software" hash="0118ef2f">2020.nuse-1.6.Software.zip</attachment>
       <doi>10.18653/v1/2020.nuse-1.6</doi>
+      <attachment type="Dataset" hash="fbb502be">2020.nuse-1.6.Dataset.pdf</attachment>
     </paper>
     <paper id="7">
       <title>Script Induction as Association Rule Mining</title>
@@ -106,6 +107,7 @@
       <abstract>In this paper we introduce the problem of extracting events from dialogue. Previous work on event extraction focused on newswire, however we are interested in extracting events from spoken dialogue. To ground this study, we annotated dialogue transcripts from fourteen episodes of the podcast This American Life. This corpus contains 1,038 utterances, made up of 16,962 tokens, of which 3,664 represent events. The agreement for this corpus has a Cohen’s Kappa of 0.83. We have open-sourced this corpus for the NLP community. With this corpus in hand, we trained support vector machines (SVM) to correctly classify these phenomena with 0.68 F1, when using episode-fold cross-validation. This is nearly 100% higher F1 than the baseline classifier. The SVM models achieved performance of over 0.75 F1 on some testing folds. We report the results for SVM classifiers trained with four different types of features (verb classes, part of speech tags, named entities, and semantic role labels), and different machine learning protocols (under-sampling and trigram context). This work is grounded in narratology and computational models of narrative. It is useful for extracting events, plot, and story content from spoken dialogue.</abstract>
       <url hash="07fbff24">2020.nuse-1.8</url>
       <doi>10.18653/v1/2020.nuse-1.8</doi>
+      <attachment type="Dataset" hash="fed4910f">2020.nuse-1.8.Dataset.zip</attachment>
     </paper>
     <paper id="9">
       <title>Annotating and quantifying narrative time disruptions in modernist and hypertext fiction</title>

diff --git a/data/xml/2020.socialnlp.xml b/data/xml/2020.socialnlp.xml
@@ -66,6 +66,7 @@
       <abstract>We investigate whether pre-trained bidirectional transformers with sentiment and emotion information improve stance detection in long discussions of contemporary issues. As a part of this work, we create a novel stance detection dataset covering 419 different controversial issues and their related pros and cons collected by procon.org in nonpartisan format. Experimental results show that a shallow recurrent neural network with sentiment or emotion information can reach competitive results compared to fine-tuned BERT with 20x fewer parameters. We also use a simple approach that explains which input phrases contribute to stance detection.</abstract>
       <url hash="a130dcb9">2020.socialnlp-1.5</url>
       <doi>10.18653/v1/2020.socialnlp-1.5</doi>
+      <attachment type="Dataset" hash="0f5692cc">2020.socialnlp-1.5.Dataset.zip</attachment>
     </paper>
     <paper id="6">
       <title>Challenges in Emotion Style Transfer: An Exploration with a Lexical Substitution Pipeline</title>