-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
68 additions
and
24 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
We propose a framework for computer-assisted text editing. It applies to translation post-editing and to paraphrasing. Our proposal relies on very simple interactions: a human editor modifies a sentence by marking tokens they would like the system to change. Our model then generates a new sentence which reformulates the initial sentence by avoiding marked words. The approach builds upon neural sequence-to-sequence modeling and introduces a neural network which takes as input a sentence along with change markers. Our model is trained on translation bitext by simulating post-edits. We demonstrate the advantage of our approach for translation post-editing through simulated post-edits. We also evaluate our model for paraphrasing through a user study. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Human evaluations of machine translation | ||
are extensive but expensive. Human evaluations | ||
can take months to finish and involve | ||
human labor that can not be reused. | ||
We propose a method of automatic machine | ||
translation evaluation that is quick, | ||
inexpensive, and language-independent, | ||
that correlates highly with human evaluation, | ||
and that has little marginal cost per | ||
run. We present this method as an automated | ||
understudy to skilled human judges | ||
which substitutes for them when there is | ||
need for quick or frequent evaluations (so we call our method the \underline{b}i\underline{l}ingual \underline{e}valuation \underline{u}nderstudy, \textsc{Bleu}). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
We describe new algorithms for training | ||
tagging models, as an alternative | ||
to maximum-entropy models or conditional | ||
random Fields (CRFs). The algorithms | ||
rely on Viterbi decoding of | ||
training examples, combined with simple | ||
additive updates. We describe theory | ||
justifying the algorithms through | ||
a modification of the proof of convergence | ||
of the perceptron algorithm for | ||
classification problems. We give experimental | ||
results on part-of-speech tagging | ||
and base noun phrase chunking, in | ||
both cases showing improvements over | ||
results for a maximum-entropy tagger. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
We consider the problem of classifying documents | ||
not by topic, but by overall sentiment, | ||
e.g., determining whether a review | ||
is positive or negative. Using movie reviews | ||
as data, we find that standard machine | ||
learning techniques definitively outperform | ||
human-produced baselines. However, | ||
the three machine learning methods | ||
we employed (Naive Bayes, maximum entropy | ||
classification, and support vector machines) | ||
do not perform as well on sentiment | ||
classification as on traditional topic-based | ||
categorization. We conclude by examining | ||
factors that make the sentiment classification | ||
problem more challenging. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,7 @@ | |
\index{Su, Pei-Hao} | ||
|
||
{\bfseries Nikola Mrk\v{s}i\'{c}} ([email protected], http://mi.eng.cam.ac.uk/~nm480) a co-founder and CEO of PolyAI, a London-based startup looking to use the latest developments in NLP to create a general machine learning platform for deploying spoken dialogue systems. He holds a PhD from the Dialogue Systems group, University of Cambridge, where he worked under the supervision of Professor Steve Young. His research is focused on belief tracking in human-machine dialogue, specifically in moving towards building open-domain, cross-lingual language understanding models that are fully data-driven. He is also interested in deep learning, semantics, Bayesian nonparametrics, unsupervised and semi-supervised learning. He previously gave a tutorial on word vector space specialisation at EACL 2017, and will teach a course on the same topic at ESSLLI 2018. He also gave invited talks at the REWORK AI Personal Assistant summit and the Chatbot Summit. | ||
% \index{Mrk\v{s}i\'{c}, Nicola} | ||
\index{Mrk\v{s}i\'{c}, Nikola} | ||
|
||
{\bfseries I\~{n}igo Casanueva} ([email protected], http://mi.eng.cam.ac.uk/~ic340/) is a Machine Learning engineer at PolyAI, a London-based startup looking to use the latest developments in NLP to create | ||
a general machine learning platform for deploying spoken dialogue systems. He got his PhD from the University of Sheffield and later he worked as Research Assistant in the Dialogue Systems group, University of Cambridge. His main research interest focuses on increasing the scalability of machine learning based dialogue management, looking for methods to make deep learning and/or reinforcement learning applicable to real world dialogue management tasks. He has published several papers on the topic, two of them nominated to best paper award. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters