From 7a0dc8246f096ca9ac87ff8c6bc35c8e3a543a6a Mon Sep 17 00:00:00 2001 From: Avneesh Singh Saluja Date: Fri, 23 May 2014 23:32:27 -0700 Subject: [PATCH] latest version of paper; sections 2 and 3 almost complete --- EMNLP2014/acl2014.tex | 431 --------- EMNLP2014/bibliography.bib | 1650 ++++++++++++++++++++++++++++++++++ EMNLP2014/emnlp2014.tex | 535 +++++++++++ EMNLP2014/spectral_scfgs.tex | 431 --------- 4 files changed, 2185 insertions(+), 862 deletions(-) delete mode 100644 EMNLP2014/acl2014.tex create mode 100644 EMNLP2014/bibliography.bib create mode 100644 EMNLP2014/emnlp2014.tex delete mode 100644 EMNLP2014/spectral_scfgs.tex diff --git a/EMNLP2014/acl2014.tex b/EMNLP2014/acl2014.tex deleted file mode 100644 index 5862361..0000000 --- a/EMNLP2014/acl2014.tex +++ /dev/null @@ -1,431 +0,0 @@ -% -% File acl2014.tex -% -% Contact: koller@ling.uni-potsdam.de, yusuke@nii.ac.jp -%% -%% Based on the style files for ACL-2013, which were, in turn, -%% Based on the style files for ACL-2012, which were, in turn, -%% based on the style files for ACL-2011, which were, in turn, -%% based on the style files for ACL-2010, which were, in turn, -%% based on the style files for ACL-IJCNLP-2009, which were, in turn, -%% based on the style files for EACL-2009 and IJCNLP-2008... - -%% Based on the style files for EACL 2006 by -%%e.agirre@ehu.es or Sergi.Balari@uab.es -%% and that of ACL 08 by Joakim Nivre and Noah Smith - -\documentclass[11pt]{article} -\usepackage{acl2014} -\usepackage{times} -\usepackage{url} -\usepackage{latexsym} - -%\setlength\titlebox{5cm} - -% You can expand the titlebox if you need extra space -% to show all the authors. Please do not make the titlebox -% smaller than 5cm (the original size); we will check this -% in the camera-ready version and ask you to change it back. - - -\title{Instructions for ACL-2014 Proceedings} - -\author{First Author \\ - Affiliation / Address line 1 \\ - Affiliation / Address line 2 \\ - Affiliation / Address line 3 \\ - {\tt email@domain} \\\And - Second Author \\ - Affiliation / Address line 1 \\ - Affiliation / Address line 2 \\ - Affiliation / Address line 3 \\ - {\tt email@domain} \\} - -\date{} - -\begin{document} -\maketitle -\begin{abstract} - This document contains the instructions for preparing a camera-ready - manuscript for the proceedings of ACL-2014. The document itself - conforms to its own specifications, and is therefore an example of - what your manuscript should look like. These instructions should be - used for both papers submitted for review and for final versions of - accepted papers. Authors are asked to conform to all the directions - reported in this document. -\end{abstract} - -\section{Credits} - -This document has been adapted from the instructions for earlier ACL -proceedings, including those for ACL-2012 by Maggie Li and Michael -White, those from ACL-2010 by Jing-Shing Chang and Philipp Koehn, -those for ACL-2008 by Johanna D. Moore, Simone Teufel, James Allan, -and Sadaoki Furui, those for ACL-2005 by Hwee Tou Ng and Kemal -Oflazer, those for ACL-2002 by Eugene Charniak and Dekang Lin, and -earlier ACL and EACL formats. Those versions were written by several -people, including John Chen, Henry S. Thompson and Donald -Walker. Additional elements were taken from the formatting -instructions of the {\em International Joint Conference on Artificial - Intelligence}. - -\section{Introduction} - -The following instructions are directed to authors of papers submitted -to ACL-2014 or accepted for publication in its proceedings. All -authors are required to adhere to these specifications. Authors are -required to provide a Portable Document Format (PDF) version of their -papers. \textbf{The proceedings are designed for printing on A4 - paper.} - -Authors from countries in which access to word-processing systems is -limited should contact the publication chairs, Alexander Koller -(\texttt{koller@ling.uni-potsdam.de}) and Yusuke Miyao -(\texttt{yusuke@nii.ac.jp}), as soon as possible. - -We will make more detailed instructions available at -\url{http://sites.google.com/site/acl2014publication}. Please check -this website regularly. - - -\section{General Instructions} - -Manuscripts must be in two-column format. Exceptions to the -two-column format include the title, authors' names and complete -addresses, which must be centered at the top of the first page, and -any full-width figures or tables (see the guidelines in -Subsection~\ref{ssec:first}). {\bf Type single-spaced.} Start all -pages directly under the top margin. See the guidelines later -regarding formatting the first page. The manuscript should be -printed single-sided and its length -should not exceed the maximum page limit described in Section~\ref{sec:length}. -Do not number the pages. - - -\subsection{Electronically-available resources} - -We strongly prefer that you prepare your PDF files using \LaTeX\ with -the official ACL 2014 style file (acl2014.sty) and bibliography style -(acl.bst). These files are available at -\url{http://www.cs.jhu.edu/ACL2014/}. You will also find the document -you are currently reading (acl2014.pdf) and its \LaTeX\ source code -(acl2014.tex) on this website. - -You can alternatively use Microsoft Word to produce your PDF file. In -this case, we strongly recommend the use of the Word template file -(acl2014.dot) on the ACL 2014 website. If you have an option, we -recommend that you use the \LaTeX2e version. If you will be - using the Microsoft Word template, we suggest that you anonymize - your source file so that the pdf produced does not retain your - identity. This can be done by removing any personal information -from your source document properties. - - - -\subsection{Format of Electronic Manuscript} -\label{sect:pdf} - -For the production of the electronic manuscript you must use Adobe's -Portable Document Format (PDF). PDF files are usually produced from -\LaTeX\ using the \textit{pdflatex} command. If your version of -\LaTeX\ produces Postscript files, you can convert these into PDF -using \textit{ps2pdf} or \textit{dvipdf}. On Windows, you can also use -Adobe Distiller to generate PDF. - -Please make sure that your PDF file includes all the necessary fonts -(especially tree diagrams, symbols, and fonts with Asian -characters). When you print or create the PDF file, there is usually -an option in your printer setup to include none, all or just -non-standard fonts. Please make sure that you select the option of -including ALL the fonts. \textbf{Before sending it, test your PDF by - printing it from a computer different from the one where it was - created.} Moreover, some word processors may generate very large PDF -files, where each page is rendered as an image. Such images may -reproduce poorly. In this case, try alternative ways to obtain the -PDF. One way on some systems is to install a driver for a postscript -printer, send your document to the printer specifying ``Output to a -file'', then convert the file to PDF. - -It is of utmost importance to specify the \textbf{A4 format} (21 cm -x 29.7 cm) when formatting the paper. When working with -{\tt dvips}, for instance, one should specify {\tt -t a4}. - -Print-outs of the PDF file on A4 paper should be identical to the -hardcopy version. If you cannot meet the above requirements about the -production of your electronic submission, please contact the -publication chairs as soon as possible. - - -\subsection{Layout} -\label{ssec:layout} - -Format manuscripts two columns to a page, in the manner these -instructions are formatted. The exact dimensions for a page on A4 -paper are: - -\begin{itemize} -\item Left and right margins: 2.5 cm -\item Top margin: 2.5 cm -\item Bottom margin: 2.5 cm -\item Column width: 7.7 cm -\item Column height: 24.7 cm -\item Gap between columns: 0.6 cm -\end{itemize} - -\noindent Papers should not be submitted on any other paper size. - If you cannot meet the above requirements about the production of your electronic submission, please contact the publication chairs above as soon as possible. - - -\subsection{Fonts} - -For reasons of uniformity, Adobe's {\bf Times Roman} font should be -used. In \LaTeX2e{} this is accomplished by putting - -\begin{quote} -\begin{verbatim} -\usepackage{times} -\usepackage{latexsym} -\end{verbatim} -\end{quote} -in the preamble. If Times Roman is unavailable, use {\bf Computer - Modern Roman} (\LaTeX2e{}'s default). Note that the latter is about - 10\% less dense than Adobe's Times Roman font. - - -\begin{table}[h] -\begin{center} -\begin{tabular}{|l|rl|} -\hline \bf Type of Text & \bf Font Size & \bf Style \\ \hline -paper title & 15 pt & bold \\ -author names & 12 pt & bold \\ -author affiliation & 12 pt & \\ -the word ``Abstract'' & 12 pt & bold \\ -section titles & 12 pt & bold \\ -document text & 11 pt &\\ -captions & 11 pt & \\ -abstract text & 10 pt & \\ -bibliography & 10 pt & \\ -footnotes & 9 pt & \\ -\hline -\end{tabular} -\end{center} -\caption{\label{font-table} Font guide. } -\end{table} - -\subsection{The First Page} -\label{ssec:first} - -Center the title, author's name(s) and affiliation(s) across both -columns. Do not use footnotes for affiliations. Do not include the -paper ID number assigned during the submission process. Use the -two-column format only when you begin the abstract. - -{\bf Title}: Place the title centered at the top of the first page, in -a 15-point bold font. (For a complete guide to font sizes and styles, -see Table~\ref{font-table}) Long titles should be typed on two lines -without a blank line intervening. Approximately, put the title at 2.5 -cm from the top of the page, followed by a blank line, then the -author's names(s), and the affiliation on the following line. Do not -use only initials for given names (middle initials are allowed). Do -not format surnames in all capitals (e.g., use ``Schlangen'' not -``SCHLANGEN''). Do not format title and section headings in all -capitals as well except for proper names (such as ``BLEU'') that are -conventionally in all capitals. The affiliation should contain the -author's complete address, and if possible, an electronic mail -address. Start the body of the first page 7.5 cm from the top of the -page. - -The title, author names and addresses should be completely identical -to those entered to the electronical paper submission website in order -to maintain the consistency of author information among all -publications of the conference. If they are different, the publication -chairs may resolve the difference without consulting with you; so it -is in your own interest to double-check that the information is -consistent. - -{\bf Abstract}: Type the abstract at the beginning of the first -column. The width of the abstract text should be smaller than the -width of the columns for the text in the body of the paper by about -0.6 cm on each side. Center the word {\bf Abstract} in a 12 point bold -font above the body of the abstract. The abstract should be a concise -summary of the general thesis and conclusions of the paper. It should -be no longer than 200 words. The abstract text should be in 10 point font. - -{\bf Text}: Begin typing the main body of the text immediately after -the abstract, observing the two-column format as shown in -the present document. Do not include page numbers. - -{\bf Indent} when starting a new paragraph. Use 11 points for text and -subsection headings, 12 points for section headings and 15 points for -the title. - -\subsection{Sections} - -{\bf Headings}: Type and label section and subsection headings in the -style shown on the present document. Use numbered sections (Arabic -numerals) in order to facilitate cross references. Number subsections -with the section number and the subsection number separated by a dot, -in Arabic numerals. Do not number subsubsections. - -{\bf Citations}: Citations within the text appear in parentheses -as~\cite{Gusfield:97} or, if the author's name appears in the text -itself, as Gusfield~\shortcite{Gusfield:97}. Append lowercase letters -to the year in cases of ambiguity. Treat double authors as -in~\cite{Aho:72}, but write as in~\cite{Chandra:81} when more than two -authors are involved. Collapse multiple citations as -in~\cite{Gusfield:97,Aho:72}. Also refrain from using full citations -as sentence constituents. We suggest that instead of -\begin{quote} - ``\cite{Gusfield:97} showed that ...'' -\end{quote} -you use -\begin{quote} -``Gusfield \shortcite{Gusfield:97} showed that ...'' -\end{quote} - -If you are using the provided \LaTeX{} and Bib\TeX{} style files, you -can use the command \verb|\newcite| to get ``author (year)'' citations. - -As reviewing will be double-blind, the submitted version of the papers -should not include the authors' names and affiliations. Furthermore, -self-references that reveal the author's identity, e.g., -\begin{quote} -``We previously showed \cite{Gusfield:97} ...'' -\end{quote} -should be avoided. Instead, use citations such as -\begin{quote} -``Gusfield \shortcite{Gusfield:97} -previously showed ... '' -\end{quote} - -\textbf{Please do not use anonymous citations} and do not include -acknowledgements when submitting your papers. Papers that do not -conform to these requirements may be rejected without review. - -\textbf{References}: Gather the full set of references together under -the heading {\bf References}; place the section before any Appendices, -unless they contain references. Arrange the references alphabetically -by first author, rather than by order of occurrence in the text. -Provide as complete a citation as possible, using a consistent format, -such as the one for {\em Computational Linguistics\/} or the one in the -{\em Publication Manual of the American -Psychological Association\/}~\cite{APA:83}. Use of full names for -authors rather than initials is preferred. A list of abbreviations -for common computer science journals can be found in the ACM -{\em Computing Reviews\/}~\cite{ACM:83}. - -The \LaTeX{} and Bib\TeX{} style files provided roughly fit the -American Psychological Association format, allowing regular citations, -short citations and multiple citations as described above. - -{\bf Appendices}: Appendices, if any, directly follow the text and the -references (but see above). Letter them in sequence and provide an -informative title: {\bf Appendix A. Title of Appendix}. - -\subsection{Footnotes} - -{\bf Footnotes}: Put footnotes at the bottom of the page and use 9 -points text. They may be numbered or referred to by asterisks or other -symbols.\footnote{This is how a footnote should appear.} Footnotes -should be separated from the text by a line.\footnote{Note the line -separating the footnotes from the text.} - -\subsection{Graphics} - -{\bf Illustrations}: Place figures, tables, and photographs in the -paper near where they are first discussed, rather than at the end, if -possible. Wide illustrations may run across both columns. Color -illustrations are discouraged, unless you have verified that -they will be understandable when printed in black ink. - -{\bf Captions}: Provide a caption for every illustration; number each one -sequentially in the form: ``Figure 1. Caption of the Figure.'' ``Table 1. -Caption of the Table.'' Type the captions of the figures and -tables below the body, using 11 point text. - - -\section{XML conversion and supported \LaTeX\ packages} - -ACL 2014 innovates over earlier years in that we will attempt to -automatically convert your \LaTeX\ source files to machine-readable -XML with semantic markup. This will facilitate future research that -uses the ACL proceedings themselves as a corpus. - -We encourage you to submit a ZIP file of your \LaTeX\ sources along -with the camera-ready version of your paper. We will then convert them -to XML automatically, using the LaTeXML tool -(\url{http://dlmf.nist.gov/LaTeXML}). LaTeXML has \emph{bindings} for -a number of \LaTeX\ packages, including the ACL 2014 stylefile. These -bindings allow LaTeXML to render the commands from these packages -correctly in XML. For best results, we encourage you to use the -packages that are officially supported by LaTeXML, listed at -\url{http://dlmf.nist.gov/LaTeXML/manual/included.bindings} - - - - - -\section{Translation of non-English Terms} - -It is also advised to supplement non-English characters and terms -with appropriate transliterations and/or translations -since not all readers understand all such characters and terms. -Inline transliteration or translation can be represented in -the order of: original-form transliteration ``translation''. - -\section{Length of Submission} -\label{sec:length} - -Long papers may consist of up to 8 pages of content, plus two extra -pages for references. Short papers may consist of up to 4 pages of -content, plus two extra pages for references. Papers that do not -conform to the specified length and formatting requirements may be -rejected without review. - - - -\section*{Acknowledgments} - -The acknowledgments should go immediately before the references. Do -not number the acknowledgments section. Do not include this section -when submitting your paper for review. - -% include your own bib file like this: -%\bibliographystyle{acl} -%\bibliography{acl2014} - -\begin{thebibliography}{} - -\bibitem[\protect\citename{Aho and Ullman}1972]{Aho:72} -Alfred~V. Aho and Jeffrey~D. Ullman. -\newblock 1972. -\newblock {\em The Theory of Parsing, Translation and Compiling}, volume~1. -\newblock Prentice-{Hall}, Englewood Cliffs, NJ. - -\bibitem[\protect\citename{{American Psychological Association}}1983]{APA:83} -{American Psychological Association}. -\newblock 1983. -\newblock {\em Publications Manual}. -\newblock American Psychological Association, Washington, DC. - -\bibitem[\protect\citename{{Association for Computing Machinery}}1983]{ACM:83} -{Association for Computing Machinery}. -\newblock 1983. -\newblock {\em Computing Reviews}, 24(11):503--512. - -\bibitem[\protect\citename{Chandra \bgroup et al.\egroup }1981]{Chandra:81} -Ashok~K. Chandra, Dexter~C. Kozen, and Larry~J. Stockmeyer. -\newblock 1981. -\newblock Alternation. -\newblock {\em Journal of the Association for Computing Machinery}, - 28(1):114--133. - -\bibitem[\protect\citename{Gusfield}1997]{Gusfield:97} -Dan Gusfield. -\newblock 1997. -\newblock {\em Algorithms on Strings, Trees and Sequences}. -\newblock Cambridge University Press, Cambridge, UK. - -\end{thebibliography} - -\end{document} diff --git a/EMNLP2014/bibliography.bib b/EMNLP2014/bibliography.bib new file mode 100644 index 0000000..a7269cf --- /dev/null +++ b/EMNLP2014/bibliography.bib @@ -0,0 +1,1650 @@ +%%%%%%%%%%%%%%%%%%%%%% +%General "classic" papers in Stats and NLP +%%%%%%%%%%%%%%%%%%%%%% +@book{Hellinger1909, + title={Neue Begr{\"u}ndung der Theorie quadratischer Formen von unendlichvielen Ver{\"a}nderlichen}, + author={Hellinger, E.}, + year={1909}, + publisher={Reimer} +} + +@article{Fisher1925, +author = {Fisher,R. A.}, +title = {{Theory of Statistical Estimation}}, +journal = {Mathematical Proceedings of the Cambridge Philosophical Society}, +volume = {22}, +issue = {05}, +issn = {1469-8064}, +pages = {700--725}, +numpages = {26}, +year = {1925} +} + +@article{Dice1945, + author = {Dice, L. R.}, + journal = {Ecology}, + number = {3}, + pages = {297--302}, + title = {{Measures of the Amount of Ecologic Association Between Species}}, + volume = {26}, + year = {1945} +} + +@article{Rao1945, +author = {Rao, C. Radhakrishna}, +title = {{Information and the Accuracy Attainable in the Estimation of Statistical Parameters}}, +journal = {Bulletin of the Calcutta Mathematical Society}, +volume = {37}, +year = {1945}, +number = {3}, +pages = {81--91}, +} + +@incollection{Zipf1949, + address = {Cambridge, MA}, + author = {Zipf, George}, + publisher = { Addison-Wesley}, + title = { Human Behaviour and the Principle of Least-Effort}, + year = { 1949} +} + +@ARTICLE{Dempster1977, + author = {A. P. Dempster and N. M. Laird and D. B. Rubin}, + title = {{Maximum likelihood from incomplete data via the EM algorithm}}, + journal = {Journal of the Royal Statistical Society, Series B}, + year = {1977}, + volume = {39}, + number = {1}, + pages = {1--38} +} + +@inproceedings{Baker1979, + author = "Baker, J.K.", + title = "Trainable grammars for speech recognition", + year = "1979", + booktitle = "Speech communication papers presented at the 97th Meeting of the Acoustical Society", + pages = "547-550", + keywords = "NLP", +} + +@book{Chentsov1982, + title={Statistical Decision Rules and Optimal Inference}, + author={Chentsov, N.N.}, + isbn={9780821813478}, + lccn={81015039}, + series={Translations of mathematical monographs}, + year={1982}, + publisher={American Mathematical Society} +} + +@inproceedings{Hwang1992, + author = {Hwang, Mei-Yuh and Huang, Xuedong}, + title = {Subphonetic modeling for speech recognition}, + booktitle = {Proceedings of the workshop on Speech and Natural Language}, + series = {HLT '91}, + year = {1992}, + isbn = {1-55860-272-0}, + location = {Harriman, New York}, + pages = {174--179}, + numpages = {6}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@article{Marcus1993, + author = {Marcus, Mitchell P. and Marcinkiewicz, Mary Ann and Santorini, Beatrice}, + title = {{Building a large annotated corpus of English: the penn treebank}}, + journal = {Computational Linguistics}, + issue_date = {June 1993}, + volume = {19}, + number = {2}, + month = jun, + year = {1993}, + issn = {0891-2017}, + pages = {313--330}, + numpages = {18}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@book{Golub1996, + author = {Golub, Gene H. and Van Loan, Charles F.}, + title = {Matrix Computations (3rd Ed.)}, + year = {1996}, + isbn = {0-8018-5414-8}, + publisher = {Johns Hopkins University Press}, + address = {Baltimore, MD, USA}, +} + + +@inproceedings{Chappelier1998, + author = {Chappelier, Jean-Cédric and Rajman, Martin}, + booktitle = {TAPD}, + date = {2004-11-29}, + pages = {133-137}, + title = {A Generalized CYK Algorithm for Parsing Stochastic CFG.}, + year = 1998 +} + +@article{Chen1999, + author = {Stanley F. Chen and + Joshua Goodman}, + title = {An empirical study of smoothing techniques for language + modeling}, + journal = {Computer Speech {\&} Language}, + volume = {13}, + number = {4}, + year = {1999}, + pages = {359-393}, +} + +@inproceedings{Klein2001, + author = {Dan Klein and + Christopher D. Manning}, + title = {Parsing and Hypergraphs}, + booktitle = {Proceedings of the Seventh International Workshop on Parsing + Technologies (IWPT-2001), 17-19 October 2001, Beijing, China}, + year = {2001}, +} +@proceedings{DBLP:conf/iwpt/2001, + booktitle = {IWPT}, + publisher = {Tsinghua University Press}, + year = {2001}, +} + +@article{Johnson2002, + author = {Johnson, Mark}, + title = {{Squibs and discussions: the DOP Estimation method is biased and inconsistent}}, + journal = {Computational Linguistics}, + issue_date = {March 2002}, + volume = {28}, + number = {1}, + month = mar, + year = {2002}, + issn = {0891-2017}, + pages = {71--76}, + numpages = {6}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@book{Amari2007, + title={Methods of Information Geometry}, + author={Amari, S. and Nagaoka, H. and Harada, D.}, + isbn={9780821843024}, + lccn={00059362}, + series={Translations of mathematical monographs}, + year={2007}, + publisher={American Mathematical Society} +} + +%%%%%%%%%%%%%%%%%% +%General MT papers +%%%%%%%%%%%%%%%%%% +@article{Brown1990, +author = {Brown, Peter F and Cocke, John and Pietra, Stephen A Della and Pietra, Vincent J Della and Jelinek, Frederick and Lafferty, John. D and Mercer, Robert. L. and Roossin, Paul S.}, +journal = {Computational Linguistics}, +keywords = {Statistical Machine Translation}, +mendeley-tags = {Statistical Machine Translation}, +number = {2}, +pages = {256--264}, +publisher = {MIT Press}, +title = {{A Statistical Approach To Machine Translation}}, +volume = {16}, +year = {1990} +} + +@article{Brown1993, + author = {Brown, Peter F. and Pietra, Vincent J. Della and Pietra, Stephen A. Della and Mercer, Robert L.}, + title = {The mathematics of statistical machine translation: parameter estimation}, + journal = {Computational Linguistics}, + issue_date = {June 1993}, + volume = {19}, + number = {2}, + month = jun, + year = {1993}, + pages = {263--311}, + numpages = {49}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@inproceedings{Vogel1996, + author = {Vogel, Stephan and Ney, Hermann and Tillmann, Christoph}, + title = {{HMM-based word alignment in statistical translation}}, + booktitle = {Proceedings of the 16th conference on Computational linguistics - Volume 2}, + series = {COLING '96}, + year = {1996}, + location = {Copenhagen, Denmark}, + pages = {836--841}, + numpages = {6}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@article{Wu1997, + author = {Wu, Dekai}, + title = {Stochastic inversion transduction grammars and bilingual parsing of parallel corpora}, + journal = {Computational Linguistics}, + issue_date = {September 1997}, + volume = {23}, + number = {3}, + month = sep, + year = {1997}, + issn = {0891-2017}, + pages = {377--403}, + numpages = {27}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@inproceedings{Marcu2002, + author = {Marcu, Daniel and Wong, William}, + title = {A phrase-based, joint probability model for statistical machine translation}, + booktitle = {Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10}, + series = {EMNLP '02}, + year = {2002}, + pages = {133--139}, + numpages = {7}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Papineni2002, + author = {Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing}, + title = {{BLEU: a method for automatic evaluation of machine translation}}, + booktitle = {Proceedings of the 40th Annual Meeting on Association for Computational Linguistics}, + series = {ACL '02}, + year = {2002}, + location = {Philadelphia, Pennsylvania}, + pages = {311--318}, + numpages = {8}, + acmid = {1073135}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Koehn2003, + author = {Koehn, Philipp and Och, Franz Josef and Marcu, Daniel}, + title = {Statistical phrase-based translation}, + booktitle = {Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1}, + series = {NAACL '03}, + year = {2003}, + location = {Edmonton, Canada}, + pages = {48--54}, + numpages = {7}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Och2003, +author = {Och, Franz Josef}, +booktitle = {Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics}, +keywords = {Statistical Machine Translation}, +mendeley-tags = {Statistical Machine Translation}, +month = {July}, +pages = {160--167}, +title = {{Minimum Error Rate Training in Statistical Machine Translation}}, +year = {2003} +} + +@article{Och2004, + author = {Och, Franz Josef and Ney, Hermann}, + title = {{The Alignment Template Approach to Statistical Machine Translation}}, + journal = {Computational Linguistics}, + issue_date = {December 2004}, + volume = {30}, + number = {4}, + month = dec, + year = {2004}, + pages = {417--449}, + numpages = {33}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@inproceedings{Shen2004, + author = {Shen, Libin and Sarkar, Anoop and Och, Franz Josef}, + title = {Discriminative Reranking for Machine Translation}, + booktitle = {HLT-NAACL 2004: Main Proceedings }, + editor = {Susan Dumais, Daniel Marcu and Salim Roukos}, + year = 2004, + month = {May 2 - May 7}, + address = {Boston, Massachusetts, USA}, + publisher = {Association for Computational Linguistics}, + pages = {177--184} +} + +@inproceedings{Galley2004, + author = {Galley, Michel and Hopkins, Mark and Knight, Kevin and Marcu, Daniel}, + title = {What's in a translation rule?}, + booktitle = {HLT-NAACL 2004: Main Proceedings }, + editor = {Susan Dumais, Daniel Marcu and Salim Roukos}, + year = 2004, + month = {May 2 - May 7}, + address = {Boston, Massachusetts, USA}, + publisher = {Association for Computational Linguistics}, + pages = {273--280}, +} + +@inproceedings{Chiang2005, + author = {Chiang, David}, + title = {A hierarchical phrase-based model for statistical machine translation}, + booktitle = {Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics}, + series = {ACL '05}, + year = {2005}, + location = {Ann Arbor, Michigan}, + pages = {263--270}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Matsuzaki2005, + author = {Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi}, + title = {Probabilistic CFG with Latent Annotations}, + booktitle = {Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics}, + series = {ACL '05}, + year = {2005}, + location = {Ann Arbor, Michigan}, + pages = {75--82}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Petrov2006, + author = {Petrov, Slav and Barrett, Leon and Thibaux, Romain and Klein, Dan}, + title = {Learning accurate, compact, and interpretable tree annotation}, + booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics}, + series = {ACL-44}, + year = {2006}, + location = {Sydney, Australia}, + pages = {433--440}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Galley2006, + author = {Galley, Michel and Graehl, Jonathan and Knight, Kevin and Marcu, Daniel and DeNeefe, Steve and Wang, Wei and Thayer, Ignacio}, + title = {Scalable inference and training of context-rich syntactic translation models}, + booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics}, + series = {ACL-44}, + year = {2006}, + location = {Sydney, Australia}, + pages = {961--968}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Zollmann2006, + author = {Zollmann, Andreas and Venugopal, Ashish}, + title = {Syntax augmented machine translation via chart parsing}, + booktitle = {Proceedings of the Workshop on Statistical Machine Translation}, + series = {StatMT '06}, + year = {2006}, + location = {New York City, New York}, + pages = {138--141}, + numpages = {4}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Huang2006, + author = {Liang Huang and Kevin Knight and Aravind Joshi}, + title = {Statistical Syntax-Directed Translation with Extended Domain of Locality}, + booktitle = {Proceedings of AMTA}, + month = {August}, + year = {2006} +} + +@InProceedings{McClosky2006, + author = {McClosky, David and Charniak, Eugene and Johnson, Mark}, + title = {Effective Self-Training for Parsing}, + booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Main Conference}, + month = {June}, + year = {2006}, + address = {New York City, USA}, + publisher = {Association for Computational Linguistics}, + pages = {152--159}, +} + +@article{Kumar2006, + author = {Shankar Kumar and + Yonggang Deng and + William Byrne}, + title = {A weighted finite state transducer translation template + model for statistical machine translation}, + journal = {Natural Language Engineering}, + volume = {12}, + number = {1}, + year = {2006}, + pages = {35-75}, +} + +@article{Chiang2007, + author = {Chiang, David}, + title = {Hierarchical Phrase-Based Translation}, + journal = {Computational Linguistics}, + issue_date = {June 2007}, + volume = {33}, + number = {2}, + month = jun, + year = {2007}, + pages = {201--228}, + numpages = {28}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@inproceedings{Zettlemoyer2007, + author = {Zettlemoyer, Luke S. and Moore, Robert C.}, + title = {Selective phrase pair extraction for improved statistical machine translation}, + booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers}, + series = {NAACL-Short '07}, + year = {2007}, + location = {Rochester, New York}, + pages = {209--212}, + numpages = {4}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@article{Graehl2008, + author = {Graehl, Jonathan and Knight, Kevin and May, Jonathan}, + title = {Training Tree Transducers}, + journal = {Comput. Linguist.}, + issue_date = {September 2008}, + volume = {34}, + number = {3}, + month = sep, + year = {2008}, + issn = {0891-2017}, + pages = {391--427}, + numpages = {37}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@INPROCEEDINGS{Paul2009, + author = {Michael Paul}, + title = "Overview of the IWSLT 2009 Evaluation Campaign", + booktitle = {Proceedings of IWSLT 2009}, + location = {Tokyo, Japan}, + year = {2009}, +} + +@inproceedings{Kumar2009, + author = {Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz}, + title = {Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices}, + booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1}, + series = {ACL '09}, + year = {2009}, + location = {Suntec, Singapore}, + pages = {163--171}, + numpages = {9}, +} + +@inproceedings{Dyer2010, + author={Chris Dyer and Adam Lopez + and Juri Ganitkevitch and Johnathan Weese and Ferhan Ture + and Phil Blunsom and Hendra Setiawan and Vladimir Eidelman and Philip Resnik}, + title={cdec: A Decoder, Alignment, and Learning framework for finite-state and context-free translation models}, + booktitle = {Proceedings of ACL}, + year={2010}, +} + +@InProceedings{Bojar2013, + author = {Bojar, Ond\v{r}ej and Buck, Christian and Callison-Burch, Chris and Federmann, Christian and Haddow, Barry and Koehn, Philipp and Monz, Christof and Post, Matt and Soricut, Radu and Specia, Lucia}, + title = {Findings of the 2013 {Workshop on Statistical Machine Translation}}, + booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation}, + month = {August}, + year = {2013}, + address = {Sofia, Bulgaria}, + publisher = {Association for Computational Linguistics}, + pages = {1--44}, +} + +%%%%%%%%%%%%%%%%%%%%%% +"Monolingual" MT papers +%%%%%%%%%%%%%%%%%%%%% + +@InProceedings{CallisonBurch2006, + author = {Callison-Burch, Chris and Koehn, Philipp and Osborne, Miles}, + title = {Improved Statistical Machine Translation Using Paraphrases}, + booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Main Conference}, + month = {June}, + year = {2006}, + address = {New York City, USA}, + publisher = {Association for Computational Linguistics}, + pages = {17--24}, +} + +@InProceedings{Haghighi2008, + author = {Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan}, + title = {Learning Bilingual Lexicons from Monolingual Corpora}, + booktitle = {Proceedings of ACL-08: HLT}, + month = {June}, + year = {2008}, + address = {Columbus, Ohio}, + publisher = {Association for Computational Linguistics}, + pages = {771--779}, +} + +@InProceedings{Ravi2011, + author = {Ravi, Sujith and Knight, Kevin}, + title = {Deciphering Foreign Language}, + booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, + month = {June}, + year = {2011}, + address = {Portland, Oregon, USA}, + publisher = {Association for Computational Linguistics}, + pages = {12--21}, +} + +%%%%%%%%%%%%%%%%%%%%%% +%non heuristic phrase extraction papers +%%%%%%%%%%%%%%%%%%%%%% + +@inproceedings{DeNero2006, + author = {DeNero, John and Gillick, Dan and Zhang, James and Klein, Dan}, + title = {Why generative phrase models underperform surface heuristics}, + booktitle = {Proceedings of the Workshop on Statistical Machine Translation}, + series = {StatMT '06}, + year = {2006}, + location = {New York City, New York}, + pages = {31--38}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{DeNero2008, + author = {DeNero, John and Bouchard-C\^{o}t{\'e}, Alexandre and Klein, Dan}, + title = {{Sampling alignment structure under a Bayesian translation model}}, + booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing}, + series = {EMNLP '08}, + year = {2008}, + location = {Honolulu, Hawaii}, + pages = {314--323}, + numpages = {10}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Blunsom2008, + author = {Blunsom, Phil and Cohn, Trevor and Osborne, Miles}, + title = {{Bayesian Synchronous Grammar Induction}}, + booktitle = {Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems}, + series = {NIPS 2008}, + year = {2008}, + location = {Vancouver, British Columbia}, +} + +@inproceedings{Zhang2008, + author = {Zhang, Hao and Gildea, Daniel and Chiang, David}, + title = {Extracting Synchronous Grammar Rules from Word-level Alignments in Linear Time}, + booktitle = {Proceedings of the 22Nd International Conference on Computational Linguistics - Volume 1}, + series = {COLING '08}, + year = {2008}, + location = {Manchester, United Kingdom}, + pages = {1081--1088}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, +} + +@inproceedings{Blunsom2009, + author = {Blunsom, Phil and Cohn, Trevor and Dyer, Chris and Osborne, Miles}, + title = {A Gibbs sampler for phrasal synchronous grammar induction}, + booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2}, + series = {ACL '09}, + year = {2009}, + location = {Suntec, Singapore}, + pages = {782--790}, + numpages = {9}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Cohn2009, + author = {Cohn, Trevor and Blunsom, Phil}, + title = {{A Bayesian model of syntax-directed tree to string grammar induction}}, + booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1}, + series = {EMNLP '09}, + year = {2009}, + isbn = {978-1-932432-59-6}, + location = {Singapore}, + pages = {352--361}, + numpages = {10}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Levenberg2012, + author = {Levenberg, Abby and Dyer, Chris and Blunsom, Phil}, + title = {{A Bayesian model for learning SCFGs with discontiguous rules}}, + booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning}, + series = {EMNLP-CoNLL '12}, + year = {2012}, + location = {Jeju Island, Korea}, + pages = {223--232}, + numpages = {10}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@InProceedings{Mylonakis2008, +author = {Markos Mylonakis and Khalil Sima'an}, +title = {{Phrase Translation Probabilities with {ITG} Priors and Smoothing as Learning Objective}}, +booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing}, +pages = {630--639}, +month = {October}, +year = {2008}, +address = {Honolulu, USA}, +publisher = {Association for Computational Linguistics} +} + +@InProceedings{Mylonakis2010, +author = {Markos Mylonakis and Khalil Sima'an}, +title = {{Learning Probabilistic Synchronous {CFGs} for Phrase-based Translation}}, +booktitle = {Fourteenth Conference on Computational Natural Language Learning}, +pages = {117--125}, +month = {July}, +year = {2010}, +address = {Uppsala, Sweden}, +publisher = {Association for Computational Linguistics} +} + +@InProceedings{Mylonakis2011, + author = {Mylonakis, Markos and Sima'an, Khalil}, + title = {{Learning Hierarchical Translation Structure with Linguistic Annotations}}, + booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies}, + month = {June}, + year = {2011}, + address = {Portland, Oregon, USA}, + publisher = {Association for Computational Linguistics}, + pages = {642--652}, +} + +@inproceedings{Huang2010, + author = {Huang, Zhongqiang and \v{C}mejrek, Martin and Zhou, Bowen}, + title = {{Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions}}, + booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing}, + series = {EMNLP '10}, + year = {2010}, + location = {Cambridge, Massachusetts}, + pages = {138--147}, + numpages = {10}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +%%%%%%%%%%%%%%%%%%% +%Morpho-MT +%%%%%%%%%%%%%%%%%%% +@InProceedings{Toutanova2008, + author = {Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim}, + title = {Applying Morphology Generation Models to Machine Translation}, + booktitle = {Proceedings of ACL-08: HLT}, + month = {June}, + year = {2008}, + address = {Columbus, Ohio}, + publisher = {Association for Computational Linguistics}, + pages = {514--522}, +} + +@inproceedings{Chahuneau2013, + author = {Victor Chahuneau and Eva Schlinger and Noah A. Smith and Chris Dyer}, + title = {Translating into Morphologically Rich Languages with Synthetic Phrases}, + booktitle = {Proc. of EMNLP}, + year = {2013} +} + +@InProceedings{Tsvetkov2013, + author = {Tsvetkov, Yulia and Dyer, Chris and Levin, Lori and Bhatia, Archna}, + title = {Generating {English} Determiners in Phrase-Based Translation with Synthetic Translation Options}, + booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation}, + month = {August}, + year = {2013}, + address = {Sofia, Bulgaria}, + publisher = {Association for Computational Linguistics}, + pages = {271--280}, +} + +%%%%%%%%%%%%%%%%%%% +%Discriminative Training in MT +%%%%%%%%%%%%%%%%%%% + +@inproceedings{Liang2006, + author = {Liang, Percy and Bouchard-C\^{o}t{\'e}, Alexandre and Klein, Dan and Taskar, Ben}, + title = {An end-to-end discriminative approach to machine translation}, + booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics}, + series = {ACL-44}, + year = {2006}, + location = {Sydney, Australia}, + pages = {761--768}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@INPROCEEDINGS{Watanabe2007, + author = {Taro Watanabe and Jun Suzuki and Hajime Tsukada and Hideki Isozaki}, + title = {Online large-margin training for statistical machine translation}, + booktitle = {In Proc. of EMNLP}, + year = {2007} +} + +@inproceedings{Chiang2009, + author = {Chiang, David and Knight, Kevin and Wang, Wei}, + title = {11,001 new features for statistical machine translation}, + booktitle = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, + series = {NAACL '09}, + year = {2009}, + isbn = {978-1-932432-41-1}, + location = {Boulder, Colorado}, + pages = {218--226}, + numpages = {9}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@article{Chiang2012, + author = {Chiang, David}, + title = {{Hope and Fear for Discriminative Training of Statistical Translation Models}}, + journal = {J. Mach. Learn. Res.}, + year = {2012}, + issn = {1532-4435}, + pages = {1159--1187}, + numpages = {29}, + publisher = {JMLR.org}, +} + +@InProceedings{Saluja2012, + author = {Saluja, Avneesh and Lane, Ian and Zhang, Ying}, + title = {Machine Translation with Binary Feedback: a Large-Margin Approach}, + booktitle = {The Tenth Biennial Conference of the Association for Machine Translation in the Americas}, + month = {October}, + year = {2012}, + address = {San Diego, California}, +} + +@InProceedings{Flanigan2013, + author = {Flanigan, Jeffrey and Dyer, Chris and Carbonell, Jaime}, + title = {Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search}, + booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, + month = {June}, + year = {2013}, + address = {Atlanta, Georgia}, + publisher = {Association for Computational Linguistics}, + pages = {248--258}, +} + + +%%%%%%%%%%%%%%%%%%%% +%Mining for parallel/comparable corpora +%%%%%%%%%%%%%%%%%%% + +@article{Resnik2003, + author = {Resnik, Philip and Smith, Noah A.}, + title = {The Web as a parallel corpus}, + journal = {Computational Linguistics}, + issue_date = {September 2003}, + volume = {29}, + number = {3}, + month = sep, + year = {2003}, + issn = {0891-2017}, + pages = {349--380}, + numpages = {32}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@inproceedings{Zhang2005, + author = {Ying Zhang and Fei Huang and Stephan Vogel}, + title = {Mining translations of OOV terms from the web through cross-lingual query expansion}, + booktitle = {SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval}, + year = {2005}, + pages = {669--670}, + location = {Salvador, Brazil}, + publisher = {ACM Press}, + address = {New York, NY, USA}, +} + +@inproceedings{Snover2008, + author = {Snover, Matthew and Dorr, Bonnie and Schwartz, Richard}, + title = {Language and translation model adaptation using comparable corpora}, + booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing}, + series = {EMNLP '08}, + year = {2008}, + location = {Honolulu, Hawaii}, + pages = {857--866}, + numpages = {10}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +%%%%%%%%%%%%%%%%%%% +%Bilingual Lexicon Induction +%%%%%%%%%%%%%%%%%%% + +@inproceedings{Rapp1995, + author = {Rapp, Reinhard}, + title = {Identifying Word Translations in Non-Parallel Texts}, + booktitle = {Proceedings of the 33rd Annual Meeting of the Association for Computational + Linguistics}, + series = {ACL '95}, + location = {Cambridge, MA}, + year = {1995}, +} + +@inproceedings{Fung1998, + author = {Fung, Pascale and Yee, Lo Yuen}, + title = {An IR approach for translating new words from nonparallel, comparable texts}, + booktitle = {Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1}, + series = {ACL '98}, + year = {1998}, + location = {Montreal, Quebec, Canada}, + pages = {414--420}, + numpages = {7}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Rapp1999, + author = {Rapp, Reinhard}, + title = {Automatic identification of word translations from unrelated English and German corpora}, + booktitle = {Proceedings of the 37th annual meeting of the Association for Computational Linguistics}, + series = {ACL '99}, + year = {1999}, + location = {College Park, Maryland}, + pages = {519--526}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@INPROCEEDINGS{Koehn2002, + author = {Philipp Koehn and Kevin Knight}, + title = {Learning a Translation Lexicon from Monolingual Corpora}, + booktitle = {In Proceedings of ACL Workshop on Unsupervised Lexical Acquisition}, + year = {2002}, + pages = {9--16} +} + +@inproceedings{Tamura2012, + author = {Akihiro Tamura and + Taro Watanabe and + Eiichiro Sumita}, + title = {Bilingual Lexicon Extraction from Comparable Corpora Using + Label Propagation}, + booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods + in Natural Language Processing and Computational Natural + Language Learning}, + series = {EMNLP-CoNLL '12}, + year = {2012}, + pages = {24-36}, +} + +@InProceedings{Irvine2013a, + author = {Irvine, Ann and Callison-Burch, Chris}, + title = {Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals}, + booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, + month = {June}, + year = {2013}, + address = {Atlanta, Georgia}, + publisher = {Association for Computational Linguistics}, + pages = {518--523}, +} + +@inProceedings{Irvine2013b, +author = {Irvine, Ann and Callison-Burch, Chris}, +title = {Combining Bilingual and Comparable Corpora for Low Resource Machine Translation}, +booktitle = {Proceedings of the ACL Workshop on Statistical Machine Translation (WMT)}, +year = {2013}, +} + +%%%%%%%%%%%%%%%%%%%% +%General Machine Learning +%%%%%%%%%%%%%%%%%%%% +@article{Camastra2003, + author = {Francesco Camastra}, + title = {Data dimensionality estimation methods: a survey}, + journal = {Pattern Recognition}, + volume = {36}, + number = {12}, + year = {2003}, + pages = {2945-2954}, +} + +@article{Hardoon2004, + author = {Hardoon, David R. and Szedmak, Sandor R. and Shawe-taylor, John R.}, + title = {Canonical Correlation Analysis: An Overview with Application to Learning Methods}, + journal = {Neural Comput.}, + issue_date = {December 2004}, + volume = {16}, + number = {12}, + month = dec, + year = {2004}, + issn = {0899-7667}, + pages = {2639--2664}, + numpages = {26}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@phdthesis{Lebanon2005, + author = {Lebanon, Guy}, + title = {Riemannian geometry and statistical machine learning}, + year = {2005}, + isbn = {0-496-93472-4}, + note = {AAI3159986}, + publisher = {Carnegie Mellon University}, + school = {Carnegie Mellon University}, + address = {Pittsburgh, PA, USA}, +} + +@book{Bishop2006, + author = {Bishop, Christopher M.}, + title = {{Pattern Recognition and Machine Learning (Information Science and Statistics)}}, + year = {2006}, + isbn = {0387310738}, + publisher = {Springer-Verlag New York, Inc.}, + address = {Secaucus, NJ, USA}, +} + +@inproceedings{Andrew2007, + author = {Andrew, Galen and Gao, Jianfeng}, + title = {Scalable training of L1-regularized log-linear models}, + booktitle = {Proceedings of the 24th international conference on Machine learning}, + series = {ICML '07}, + year = {2007}, + isbn = {978-1-59593-793-3}, + location = {Corvalis, Oregon}, + pages = {33--40}, + numpages = {8}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +@inproceedings{Duchi2008, + author = {Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar}, + title = {Efficient projections onto the l1-ball for learning in high dimensions}, + booktitle = {Proceedings of the 25th international conference on Machine learning}, + series = {ICML '08}, + year = {2008}, + isbn = {978-1-60558-205-4}, + location = {Helsinki, Finland}, + pages = {272--279}, + numpages = {8}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +@article{Ganchev2010, + author = {Ganchev, Kuzman and Gra\c{c}a, Jo\~{a}o and Gillenwater, Jennifer and Taskar, Ben}, + title = {{Posterior Regularization for Structured Latent Variable Models}}, + journal = {J. Mach. Learn. Res.}, + volume = {99}, + month = {August}, + year = {2010}, + issn = {1532-4435}, + pages = {2001--2049}, + numpages = {49}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@article{Dekel2010, + author = {Ofer Dekel and + Ohad Shamir}, + title = {Multiclass-Multilabel Classification with More Classes than + Examples}, + journal = {Journal of Machine Learning Research - Proceedings Track}, + volume = {9}, + year = {2010}, + pages = {137-144}, +} + +@inproceedings{Berg-Kirkpatrick2010, + author = {Berg-Kirkpatrick, Taylor and Bouchard-C\^{o}t{\'e}, Alexandre and DeNero, John and Klein, Dan}, + title = {Painless Unsupervised Learning with Features}, + booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, + series = {HLT '10}, + year = {2010}, + isbn = {1-932432-65-5}, + location = {Los Angeles, California}, + pages = {582--590}, + numpages = {9}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +%%%%%%%%%%%%%%%%%%%%%%% +%Sparsity papers +%%%%%%%%%%%%%%%%%%%%%%% +@article{Natarajan1995, + author = {Natarajan, B. K.}, + title = {Sparse Approximate Solutions to Linear Systems}, + journal = {SIAM J. Comput.}, + issue_date = {April 1995}, + volume = {24}, + number = {2}, + month = apr, + year = {1995}, + issn = {0097-5397}, + pages = {227--234}, + numpages = {8}, + publisher = {Society for Industrial and Applied Mathematics}, + address = {Philadelphia, PA, USA}, + keywords = {linear systems, sparse solutions}, +} + +@ARTICLE{Tibshirani1996, + author = {Robert Tibshirani}, + title = {Regression Shrinkage and Selection Via the Lasso}, + journal = {Journal of the Royal Statistical Society, Series B}, + year = {1996}, + volume = {58}, + pages = {267--288} +} + +@article{Chen2001, + author = {Chen, Scott Shaobing and Donoho, David L. and Saunders, Michael A.}, + title = {Atomic Decomposition by Basis Pursuit}, + journal = {SIAM Rev.}, + issue_date = {2001}, + volume = {43}, + number = {1}, + month = jan, + year = {2001}, + issn = {0036-1445}, + pages = {129--159}, + numpages = {31}, + publisher = {Society for Industrial and Applied Mathematics}, + address = {Philadelphia, PA, USA}, + keywords = {\$\ell^1\$ norm optimization, MATLAB code, cosine packets, denoising, interior-point methods for linear programming, matching pursuit, multiscale edges, overcomplete signal representation, time-frequency analysis, time-scale analysis, total variation denoising, wavelet packets, wavelets}, +} + +@article{Candes2005, + author = {Candes, E. J. and Tao, T.}, + title = {Decoding by linear programming}, + journal = {IEEE Trans. Inf. Theor.}, + issue_date = {December 2005}, + volume = {51}, + number = {12}, + month = dec, + year = {2005}, + issn = {0018-9448}, + pages = {4203--4215}, + numpages = {13}, + acmid = {2271950}, + publisher = {IEEE Press}, + address = {Piscataway, NJ, USA}, +} + +@inproceedings{Garg2009, + author = {Garg, Rahul and Khandekar, Rohit}, + title = {Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property}, + booktitle = {Proceedings of the 26th Annual International Conference on Machine Learning}, + series = {ICML '09}, + year = {2009}, + isbn = {978-1-60558-516-1}, + location = {Montreal, Quebec, Canada}, + pages = {337--344}, + numpages = {8}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + +@inproceedings{Pilanci2012, + Author = {Mert Pilanci and Laurent {El Ghaoui} and Venkat Chandrasekaran}, + Title = {Recovery of Sparse Probability Measures via Convex Programming}, + Booktitle= {Proc. Advances in Neural Information Processing Systems ({NIPS})}, + Year = {2012}, + Month = Dec +} + +@inproceedings{Kyrillidis2013, + Publisher = {JMLR Workshop and Conference Proceedings}, + Title = {Sparse projections onto the simplex}, + Booktitle = {Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, + Author = {Anastasios Kyrillidis and Stephen Becker and Volkan Cevher and Christoph Koch}, + Month = may, + Volume = {28}, + Editor = {Sanjoy Dasgupta and David Mcallester}, + Year = {2013}, + Pages = {235-243}, + } + +%%%%%%%%%%%%%%%%%%%%%% +%Spectral Learning papers +%%%%%%%%%%%%%%%%%%%%% +@article{Jaeger2000, + author = {Jaeger, Herbert}, + title = {{Observable Operator Models for Discrete Stochastic Time Series}}, + journal = {Neural Comput.}, + issue_date = {June 2000}, + volume = {12}, + number = {6}, + month = jun, + year = {2000}, + issn = {0899-7667}, + pages = {1371--1398}, + publisher = {MIT Press}, + address = {Cambridge, MA, USA}, +} + +@inproceedings{Hsu2009, + author = {Daniel Hsu and + Sham M. Kakade and + Tong Zhang}, + title = {{A Spectral Algorithm for Learning Hidden Markov Models}}, + booktitle = {COLT}, + year = {2009}, +} + +@inproceedings{Boots2011, + Author = "Byron Boots and Sajid Siddiqi and Geoffrey Gordon ", + Booktitle = "Proceedings of the 25th National Conference on Artificial Intelligence (AAAI-2011)", + Title = "An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems ", + Year = "2011" +} + +@inproceedings{Balle2011, + author = {Balle, Borja and Quattoni, Ariadna and Carreras, Xavier}, + title = {A spectral learning algorithm for finite state transducers}, + booktitle = {Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I}, + series = {ECML PKDD'11}, + year = {2011}, + location = {Athens, Greece}, + pages = {156--171}, + numpages = {16}, + publisher = {Springer-Verlag}, + address = {Berlin, Heidelberg}, +} + +@inproceedings{Parikh2011, + author = {Ankur P. Parikh and + Le Song and + Eric P. Xing}, + title = {{A Spectral Algorithm for Latent Tree Graphical Models}}, + booktitle = {Proceedings of the 28th International Conference on Machine + Learning (ICML)}, + year = {2011}, + pages = {1065-1072}, +} + +@inproceedings{Anandkumar2011, + title ={{Spectral Methods for Learning Multivariate Latent Tree Structure}}, + author={Animashree Anandkumar and Kamalika Chaudhuri and Daniel J. Hsu and Sham M. Kakade and Le Song and Tong Zhang}, + booktitle = {Advances in Neural Information Processing Systems 24}, + editor = {J. Shawe-Taylor and R.S. Zemel and P. Bartlett and F.C.N. Pereira and K.Q. Weinberger}, + pages = {2025--2033}, + year = {2011} +} + +@inproceedings{Dhillon2011, + title = {{Multi-View Learning of Word Embeddings via CCA}}, + author = {Paramveer S. Dhillon and Dean Foster and Lyle Ungar}, + booktitle = {Advances in Neural Information Processing Systems (NIPS)}, + volume={24}, + year = {2011} +} + +@inproceedings{Dhillon2012, + author = {Paramveer S. Dhillon and Jordan Rodu and Michael Collins and Dean P. Foster and Lyle H. Ungar}, + title = {{Spectral Dependency Parsing with Latent Variables}}, + booktitle = {Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning}, + series = {EMNLP-CoNLL'12}, + year = {2012}, + location = {Jeju, Korea} + } + +@inproceedings{Anandkumar2012, +title ={{A Spectral Algorithm for Latent Dirichlet Allocation}}, +author={Animashree Anandkumar and Dean Foster and Daniel Hsu and Sham Kakade and Yi-Kai Liu}, +booktitle = {Advances in Neural Information Processing Systems 25}, +editor = {P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger}, +pages = {926--934}, +year = {2012}, +} + +@inproceedings{Cohen2012a, + author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar", + title = "Spectral Learning of Latent-Variable {PCFGs}", + booktitle = "Proceedings of ACL", + year = "2012" +} + +@inproceedings{Cohen2012b, + author = "S. B. Cohen and M. Collins", + title = "Tensor Decomposition for Fast Latent-Variable {PCFG} Parsing", + booktitle = "Proceedings of NIPS", + year = "2012" +} + +@InProceedings{Balle2012, + author = {Borja Balle and Ariadna Quattoni and Xavier Carreras}, + title = {{Local Loss Optimization in Operator Models: A New Insight into Spectral Learning}}, + booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)}, + series = {ICML '12}, + year = {2012}, + editor = {John Langford and Joelle Pineau}, + location = {Edinburgh, Scotland, GB}, + month = {July}, + publisher = {Omnipress}, + address = {New York, NY, USA}, + pages= {1879--1886}, +} + +@incollection{Hsu2012, +title ={{Identifiability and Unmixing of Latent Parse Trees}}, +author={Daniel Hsu and Sham Kakade and Percy Liang}, +booktitle = {Advances in Neural Information Processing Systems 25}, +editor = {P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger}, +pages = {1520--1528}, +year = {2012}, +} + +@inproceedings{Cohen2013, + author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar", + title = "Experiments with Spectral Learning of Latent-Variable {PCFGs}", + booktitle = "Proceedings of {NAACL}", + year = "2013" +} + +%%%%%%%%%%%%%%%%%%%% +%Graph-based SSL +%%%%%%%%%%%%%%%%%%%% +@INPROCEEDINGS{Szummer2001, + author = {Martin Szummer and Tommi Jaakkola}, + title = {Partially labeled classification with Markov random walks}, + booktitle = {Advances in Neural Information Processing Systems}, + year = {2001}, + pages = {945--952}, + publisher = {MIT Press} +} + +@TECHREPORT{Zhu2002, + author = {Xiaojin Zhu and Zoubin Ghahramani}, + title = {Learning from Labeled and Unlabeled Data with Label Propagation}, + institution = {Carnegie Mellon University}, + year = {2002} +} + +@inproceedings{Zhu2003, + author = {Xiaojin Zhu and + Zoubin Ghahramani and + John D. Lafferty}, + title = {Semi-Supervised Learning Using Gaussian Fields and Harmonic + Functions}, + booktitle = {Proceedings of the Twentieth International Conference on Machine Learning}, + series = {ICML '03}, + year = {2003}, + pages = {912-919}, +} + +@incollection {Zhou2004, + author = " Dengyong Zhou and Olivier Bousquet and Thomas Navin Lal and Jason Weston and Bernhard {Sch\"{o}lkopf}", + title = " Learning with Local and Global Consistency", + booktitle = "Advances in Neural Information Processing Systems 16", + editor = "Sebastian Thrun and Lawrence Saul and Bernhard {Sch\"{o}lkopf}", + publisher = "MIT Press", + address = "Cambridge, MA", + year = "2004", +} + +@phdthesis{Zhu2005, + author = {Zhu, Xiaojin}, + title = {Semi-supervised learning with graphs}, + year = {2005}, + isbn = {0-542-19059-1}, + note = {AAI3179046}, + publisher = {Carnegie Mellon University}, + school = {Carnegie Mellon University}, + address = {Pittsburgh, PA, USA}, +} + +@article{Belkin2006, +author = {Mikhail Belkin and Partha Niyogi and Vikas Sindhwani}, +title = {Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.}, +journal = {Journal of Machine Learning Research}, +volume = {7}, +year = {2006}, +pages = {2399-2434}, +} + +@INCOLLECTION{Bengio2006, + author = {Bengio, Yoshua and Delalleau, Olivier and Le Roux, Nicolas}, + editor = {Chapelle, Olivier and {Sch{\"{o}}lkopf}, Bernhard and Zien, Alexander}, + title = {Label Propagation and Quadratic Criterion}, + booktitle = {Semi-Supervised Learning}, + year = {2006}, + pages = {193--216}, + publisher = {{MIT} Press}, +} + +@article{Yan2007, + author = {Yan, Shuicheng and Xu, Dong and Zhang, Benyu and Zhang, Hong-Jiang and Yang, Qiang and Lin, Stephen}, + title = {Graph Embedding and Extensions: A General Framework for Dimensionality Reduction}, + journal = {IEEE Trans. Pattern Anal. Mach. Intell.}, + issue_date = {January 2007}, + volume = {29}, + number = {1}, + month = jan, + year = {2007}, + issn = {0162-8828}, + pages = {40--51}, + numpages = {12}, + publisher = {IEEE Computer Society}, + address = {Washington, DC, USA}, +} + + +@inproceedings{Talukdar2009, + author = {Talukdar, Partha Pratim and Crammer, Koby}, + title = {New Regularized Algorithms for Transductive Learning}, + booktitle = {Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II}, + series = {ECML PKDD '09}, + year = {2009}, + isbn = {978-3-642-04173-0}, + location = {Bled, Slovenia}, + pages = {442--457}, + numpages = {16}, +} + +@incollection{Subramanya2009, + title = {Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification}, + author = {Amarnag Subramanya and Jeff Bilmes}, + booktitle = {Advances in Neural Information Processing Systems 22}, + editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C. K. I. Williams and A. Culotta}, + pages = {1803--1811}, + year = {2009} +} + +@InProceedings{Dhillon2010, + author = {Paramveer S. Dhillon and Partha Pratim Talukdar and Koby Crammer}, + title = {Learning Better Data Representation using Inference-Driven Metric Learning (IDML)}, + booktitle = {Proceedings of the ACL 2010 Conference}, + month = {July }, + year = {2010}, + address = {Uppsala, Sweden}, + publisher = {Association for Computational Linguistics} +} + +@article{Subramanya2011, + author = {Subramanya, Amarnag and Bilmes, Jeff}, + title = {Semi-Supervised Learning with Measure Propagation}, + journal = {J. Mach. Learn. Res.}, + issue_date = {2/1/2011}, + volume = {12}, + month = nov, + year = {2011}, + issn = {1532-4435}, + pages = {3311--3370}, + numpages = {60}, + publisher = {JMLR.org}, +} + +%%%%%%%%%%%%%%%%%%%% +%Graph-based SSL & NLP +%%%%%%%%%%%%%%%%%%%% +@inproceedings{Rao2008, + author = {Rao, Delip and Yarowsky, David and Callison-Burch, Chris}, + title = {Affinity Measures Based on the Graph Laplacian}, + booktitle = {Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing}, + series = {TextGraphs-3}, + year = {2008}, + location = {Manchester, United Kingdom}, + pages = {41--48}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@InProceedings{Alexandrescu2009, + author = {Alexandrescu, Andrei and Kirchhoff, Katrin}, + title = {Graph-based Learning for Statistical Machine Translation}, + booktitle = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics}, + series = {NAACL-HLT '09}, + month = {June}, + year = {2009}, + location = {Boulder, Colorado}, + publisher = {Association for Computational Linguistics}, + pages = {119--127}, +} + +@inproceedings{Subramanya2010, + author = {Subramanya, Amarnag and Petrov, Slav and Pereira, Fernando}, + title = {Efficient graph-based semi-supervised learning of structured tagging models}, + booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing}, + series = {EMNLP '10}, + year = {2010}, + location = {Cambridge, Massachusetts}, + pages = {167--176}, + numpages = {10}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@InProceedings{Das2011, + author = {Das, Dipanjan and Petrov, Slav}, + title = {Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections}, + booktitle = {Proc. of ACL}, + year = {2011} +} + +@inproceedings{Liu2012, + author = {Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming}, + title = {Learning translation consensus with structured label propagation}, + booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1}, + series = {ACL '12}, + year = {2012}, + location = {Jeju Island, Korea}, + pages = {302--310}, + numpages = {9}, + acmid = {2390567}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@InProceedings{Klementiev2012, + author = {Klementiev, Alexandre and Irvine, Ann and Callison-Burch, Chris and Yarowsky, David}, + title = {Toward Statistical Machine Translation without Parallel Corpora}, + booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics}, + month = {April}, + year = {2012}, + address = {Avignon, France}, + publisher = {Association for Computational Linguistics}, + pages = {130--140}, +} + +@inproceedings{Das2012, +Author = {Das, Dipanjan and Smith, Noah A.}, +Booktitle = {Proc. of NAACL-HLT}, +Title = {Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties}, +Year = {2012}} + +@inproceedings{Razmara2013, + author = {Razmara, Majid and Siahbani, Maryam and Haffari, Gholamreza and Sarkar, Anoop}, + title = {Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation}, + booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics}, + series = {ACL-51}, + year = {2013}, + location = {Sofia, Bulgaria}, + numpages = {8}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@inproceedings{Saluja2014, +Author = {Avneesh Saluja and Kristina Toutanova and Chris Quirk and Hany Hassan}, +Title = {Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data}, +Year = {2014}, +booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics}, +series = {ACL-52}, + location = {Baltimore, MD}, + numpages = {9}, + publisher = {Association for Computational Linguistics}, +} + + +%%%%%%%%%%%%%%%%%%%% +%Distributed Representations and Neural Networks +%%%%%%%%%%%%%%%%%%% + +@inproceedings{Vincent2008, + author = {Vincent, Pascal and Larochelle, Hugo and Bengio, Yoshua and Manzagol, Pierre-Antoine}, + title = {Extracting and Composing Robust Features with Denoising Autoencoders}, + booktitle = {Proceedings of the 25th International Conference on Machine Learning}, + series = {ICML '08}, + year = {2008}, + isbn = {978-1-60558-205-4}, + location = {Helsinki, Finland}, + pages = {1096--1103}, + numpages = {8}, + publisher = {ACM}, + address = {New York, NY, USA}, +} + + +@inproceedings{Turian2010, + author = {Turian, Joseph and Ratinov, Lev and Bengio, Yoshua}, + title = {Word Representations: A Simple and General Method for Semi-supervised Learning}, + booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics}, + series = {ACL '10}, + year = {2010}, + location = {Uppsala, Sweden}, + pages = {384--394}, + numpages = {11}, + acmid = {1858721}, + publisher = {Association for Computational Linguistics}, + address = {Stroudsburg, PA, USA}, +} + +@INPROCEEDINGS{Mikolov2010, + author = {Tomáš Mikolov and Martin Karafiát and Lukáš Burget and Jan + Černocký and Sanjeev Khudanpur}, + title = {Recurrent neural network based language model}, + pages = {1045--1048}, + booktitle = {Proceedings of the 11th Annual Conference of the + International Speech Communication Association (INTERSPEECH + 2010)}, + journal = {Proceedings of Interspeech}, + volume = {2010}, + number = {9}, + year = {2010}, + publisher = {International Speech Communication Association}, +} + +@article{Turney2010, + author = {Turney, Peter D. and Pantel, Patrick}, + title = {From Frequency to Meaning: Vector Space Models of Semantics}, + journal = {J. Artif. Int. Res.}, + issue_date = {January 2010}, + volume = {37}, + number = {1}, + month = jan, + year = {2010}, + issn = {1076-9757}, + pages = {141--188}, + numpages = {48}, + publisher = {AI Access Foundation}, + address = {USA}, +} + +@phdthesis{Mikolov2012, + author = {Mikolov, Tomas}, + title = {Statistical Language Models based on Neural Networks}, + year = {2012}, + publisher = {Brno University of Technology}, + school = {Brno University of Technology}, + address = {Pittsburgh, PA, USA}, +} + +@inproceedings{Huang2012, +author = {Eric H. Huang and Richard Socher and Christopher D. Manning and Andrew Y. Ng}, +title = {{Improving Word Representations via Global Context and Multiple Word Prototypes}}, +booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)}, +year = 2012 +} + +@InProceedings{Mikolov2013a, + author = {Mikolov, Tomas and Yih, Wen-tau and Zweig, Geoffrey}, + title = {Linguistic Regularities in Continuous Space Word Representations}, + booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, + month = {June}, + year = {2013}, + address = {Atlanta, Georgia}, + publisher = {Association for Computational Linguistics}, + pages = {746--751}, +} + +@misc{Mikolov2013b, +Author = {Tomas Mikolov and Ilya Sutskever and Kai Chen and Greg Corrado and Jeffrey Dean}, +Title = {Distributed Representations of Words and Phrases and their Compositionality}, +Year = {2013}, +Eprint = {arXiv:1310.4546}, +} + + +%%%%%%%%%%%%%%%%%%%% +%Compositional Semantics +%%%%%%%%%%%%%%%%%%%% +@phdthesis{Sahlgren2006, + author = {Sahlgren, M.}, + title = {The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces}, + year = {2006}, + publisher = {Stockholm University}, + school = {Department of Linguistics, Stockholm University}, +} + +@article{Mitchell2010, + author = {Jeff Mitchell and Mirella Lapata}, + title = {Composition in Distributional Models of Semantics}, + journal = {Cognitive Science}, + year = {2010}, + volume = {34}, + number = {8}, + pages = {1388--1439} + } + + @inproceedings{Socher2012, + author = {Richard Socher and Brody Huval and Christopher D. Manning and Andrew Y. Ng}, + title = {{Semantic Compositionality Through Recursive Matrix-Vector Spaces}}, + booktitle = {Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, + year = 2012 + } + + +@InProceedings{Tsubaki2013, + author = {Tsubaki, Masashi and Duh, Kevin and Shimbo, Masashi and Matsumoto, Yuji}, + title = {Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks}, + booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing}, + month = {October}, + year = {2013}, + address = {Seattle, Washington, USA}, + publisher = {Association for Computational Linguistics}, + pages = {130--140}, + url = {http://www.aclweb.org/anthology/D13-1014} +} + + + + + + + + + + + + + + + + diff --git a/EMNLP2014/emnlp2014.tex b/EMNLP2014/emnlp2014.tex new file mode 100644 index 0000000..8c7472d --- /dev/null +++ b/EMNLP2014/emnlp2014.tex @@ -0,0 +1,535 @@ +% +% File acl2014.tex +% +% Contact: koller@ling.uni-potsdam.de, yusuke@nii.ac.jp +%% +%% Based on the style files for ACL-2013, which were, in turn, +%% Based on the style files for ACL-2012, which were, in turn, +%% based on the style files for ACL-2011, which were, in turn, +%% based on the style files for ACL-2010, which were, in turn, +%% based on the style files for ACL-IJCNLP-2009, which were, in turn, +%% based on the style files for EACL-2009 and IJCNLP-2008... + +%% Based on the style files for EACL 2006 by +%%e.agirre@ehu.es or Sergi.Balari@uab.es +%% and that of ACL 08 by Joakim Nivre and Noah Smith + +\documentclass[11pt]{article} +\usepackage{acl2014} +\usepackage{times} +\usepackage{url} +\usepackage{multirow} +\usepackage{latexsym} +\usepackage{amsmath} +\usepackage{amssymb} +\usepackage{algorithm} +\usepackage{graphicx} +\usepackage[font=small,labelfont=bf]{caption} +\usepackage{subcaption} +\usepackage{enumitem} +\usepackage{bm} +\usepackage{multirow} + +\DeclareMathOperator*{\argmax}{arg\,max} +\newcommand{\ts}{\textsuperscript} +\newcommand{\rione}{r^{(i)}} +\newcommand{\ritwo}{r^{(i,2)}} +\newcommand{\rithree}{r^{(i,3)}} +\newcommand{\xione}{t^{(i,1)}} +\newcommand{\xitwo}{t^{(i,2)}} +\newcommand{\xithree}{t^{(i,3)}} +\newcommand{\aione}{a_i} +\newcommand{\aitwo}{a^{(i,2)}} +\newcommand{\aithree}{a^{(i,3)}} +\newcommand{\yione}{y^{(i,1)}} +\newcommand{\yitwo}{y^{(i,2)}} +\newcommand{\yithree}{y^{(i,3)}} +\newcommand{\phii}{\phi^{(i)}} +\newcommand{\bi}{z^{(i)}} +\newcommand{\oi}{o^{(i)}} +\newcommand{\p}{{\cal P}} +\newcommand{\internal}{{\cal I}} +\newcommand{\n}{{\cal N}} +\newcommand{\rules}{{\cal R}} +\newcommand{\srule}{{X \rightarrow b, c}} +\newcommand{\pa}{\mathrm{pa}} +\newcommand{\lc}{\mathrm{lc}} +\newcommand{\rc}{\mathrm{rc}} +\newcommand{\diag}{\mathrm{diag}} +\newcommand{\tleft}{\beta} +\newcommand{\tright}{\gamma} +\newcommand{\tree}{\tau} +\newcommand{\e}[1]{\hat{#1}} +\newcommand{\commentout}[1]{} +\newcommand{\shorten}[1]{} +\newcommand{\tcommentout}[1]{#1} +\newcommand{\bS}{{\bf S}} +\newcommand{\bX}{{\bf X}} +\newfont{\msym}{msbm10} +\newcommand{\reals}{\mbox{\msym R}} +\newcommand{\qed}{{\setlength{\fboxsep}{0pt} +\framebox[7pt]{\rule{0pt}{7pt}}}} +\newcommand{\balpha}{\bm{\alpha}} +\newcommand{\bbeta}{\bm{\beta}} + +% You can expand the titlebox if you need extra space +% to show all the authors. Please do not make the titlebox +% smaller than 5cm (the original size); we will check this +% in the camera-ready version and ask you to change it back. +%\setlength\titlebox{5cm} %for expanding the title box +\title{Latent Synchronous CFGs for Hierarchical Phrase-based Translation} + +%\author{First Author \\ +% Affiliation / Address line 1 \\ +% Affiliation / Address line 2 \\ +% Affiliation / Address line 3 \\ +% {\tt email@domain} \\\And +% Second Author \\ +% Affiliation / Address line 1 \\ +% Affiliation / Address line 2 \\ +% Affiliation / Address line 3 \\ +% {\tt email@domain} \\} + +\date{} + +\begin{document} +\maketitle +\begin{abstract} + Abstract goes here. +\end{abstract} + +\section{Introduction} +Introduction goes here. +%Statistical approaches to machine translation (MT) have achieved state-of-the-art results in many typologically diverse language pairs \cite{Bojar2013} by learning translation rules over longer multiword units or phrases (e.g., French $\rightarrow$ English: \emph{un chien Andalou} $\rightarrow$ \emph{an Andalusian dog}), instead of lexical or word units (\emph{chien} $\rightarrow$ \emph{dog}). +%Unfortunately, phrase-based translation contains its own set of issues. +%A prominent one is the significant increase in model size due to phrasal units, which makes parameter estimation during training a challenge and significantly slows slows down decoding during test time. +%The phrasal extraction heuristics that extract phrase pairs consistent with word-level alignments are often to blame, since there is a tendency to extract longer length phrasal translation units that are mainly applicable in restricted settings, e.g., phrase pairs like the German-English pair `\emph{der Amerikanische Pr{\"a}sident $\rightarrow$ convention allows the American president}'. +%However, it has been found that such translation units actually perform better than their minimal counterparts \cite{Galley2006}, primarily because they are more in-line with the kinds of independence assumptions we make with context-free grammar formalisms: with larger rules, right-hand side productions can be generated in a relatively context-independent manner. + +%In this work, we propose to model additional context via a latent variable model that is featurized over inside and outside sub-trees of a synchronous grammar. +%Using a low-rank representation of the feature cross-product space (informally, the space that intuitively captures interactions of feature functions defined over inside and outside sub-trees), we can associate an additional set of parameters for each rule, representing the distribution over latent states. +%Unlike the expectation maximization (EM) algorithm, an iterative procedure based on maximum likelihood estimation that often gets stuck in local optima, our approach utilizes a spectrally-motivated moments-based method to estimate parameters of the latent variable model, which offers a more scalable way to estimate the millions of parameters in our model. +%During decoding, these states are marginalized yielding a context-dependent likelihood for each rule, which can then be incorporated as an additional feature in the standard MT pipeline. + +\section{Latent Variable Models for Refinement} +The core idea behind our proposed approach is an implicit refinement of translation rules in a synchronous context-free grammar (SCFG), using a latent variable model. +We first introduce the latent SCFG formalism and discuss how we acquire training examples of synchronous parse trees from word alignments, followed by a summary of the decoding algorithm for marginalizing over latent states, as it provides a natural way to introduce the data structures and representations used for the latent parameters. +The decoder is based on simple tensor-vector products that sum over the latent states. +Two methods to estimate the parameters will be discussed in \S\ref{sec:estimation}. + +\subsection{Latent SCFGs} +\label{sec:formalism} +We extend the definition of L-PCFGs \cite{Matsuzaki2005,Petrov2006} to synchronous grammars as used in machine translation \cite{Galley2004,Chiang2005}. +In this work, the aim is to refine the one-category grammar introduced by \newcite{Chiang2005} for hierarchical phrase-based translation (HPBT) in an effort to incorporate additional translational context via refined non-terminal (NT) categories instead of longer translation rules. +Thus, the following discussion is restricted to these kinds of grammars, although the method is equally applicable in other scenarios, e.g., the extended tree-to-string transducer ({\bf xRs}) formalism \cite{Huang2006,Graehl2008} commonly used in syntax-directed translation. +An important point to keep in mind in comparison to L-PCFGs is that the right-hand side (RHS) non-terminals of synchronous rules are aligned pairs across the source and target languages. + +A latent SCFG (L-SCFG) is a 6-tuple $(\mathcal{N}, m, n_s, n_t, \pi, t)$ where: +\begin{itemize} + \item $\mathcal{N}$ is a set of NT symbols in the grammar. + In our case, the set consists of only two symbols, \bX~and the goal symbol \bS. + \item $[m]$ is the set of possible hidden states associated with NTs. + Aligned pairs of NTs across the source and target languages share the same hidden state. + In line with previous work, we assume that the states associated with NTs on the RHS are \emph{not} conditionally independent given the latent state of the left-hand side (LHS). + \item $[n]_s$ is the set of source side words, i.e., the source-side terminal vocabulary. + \item $[n]_t$ is the set of target side words, i.e., the target-side vocabulary. + \item For $a =\bX, b \in [n]_s \cup \mathcal{N} \setminus \{\bS\}, c \in [n]_t + \cup \mathcal{N} \setminus \{\bS\}, h_1, h_2, h_3 \in [m]$, we have the following context-free rules, based on the number of NT symbols \bX~in the RHS of the rule: + \begin{itemize} + \item Two NTs: \\ + $a(h_1) \rightarrow ~$, where $\sim$ is a one-to-one correspondence between the NT symbols of $b$ and $c$, $h_2$ is associated with one of the aligned NT pairs, and $h_3$ is associated with the other pair. + The rule has an associated parameter $t(a \rightarrow b,c, h_2, h_3 | a, h_1)$ + \item One NT: \\ + $a(h_1) \rightarrow ~$, with associated parameter $t(a \rightarrow b, c, h_2 | a, h_1)$ + \item No NTs: + $a(h_1) \rightarrow ~$, with associated parameter $t(a \rightarrow b,c | a, h_1)$ + \end{itemize} + \item For $a=\bS$, $h \in [m]$, $\pi(\bS, h)$ is a parameter specifying the probability of $\bS(h)$ being at the root of the tree. +\end{itemize} +A skeletal tree (s-tree) for a sentence is a sequence of rules $r_1, \dots, r_N$ where each $r_i$ is of the form of one of the context-free rules above. +A full tree consists of an s-tree $r_1, \dots, r_N$ together with values $h_1, \dots, h_N$. +In HPBT, where only rules with at most two NTs in the RHS are used, the set of rules obtained from the training corpus $\rules$ can be further divided into three non-overlapping sets $\rules_0, \rules_1, \rules_2 \in \rules$, containing the pre-terminal, unary, and binary rules respectively. + +\subsection{Minimal Grammar Extraction} +\label{sec:mingrammar} +In order to learn the parameters $t$, we need a set of synchronous s-trees, which can be acquired from word alignments. +%For each rule $r_i$ in each s-tree, we can either compute partial counts in the expectation step of the EM algorithm, or extract second-order moments of features on which we compute an SVD. +During the extraction phase, if we consider {\bf composed} rules, namely rules that can be formed out of smaller rules in the grammar, then there are multiple synchronous trees consistent with the alignments for a given sentence pair, and thus the total number of applicable rules can be combinatorially larger than if we just consider the set of {\bf minimal} rules i.e., rules that cannot be formed from other rules. + +To extract a set of minimal rules for each word-aligned sentence pair, we utilize the linear-time extraction algorithm of \newcite{Zhang2008}. +Since the algorithm extracts one minimal tree for each sentence pair, derivation forests do not have to be considered, making parameter estimation more tractable.\footnote{For our \textsc{DE-EN} corpus (\S\ref{sec:data}), a grammar extracted using the traditional heuristics was more than 80 times larger than the minimal grammar.} +Furthermore, by using minimal rules as a starting point instead of the traditional heuristically-extracted rules \cite{Chiang2005} or arbitrary compositions of minimal rules \cite{Galley2006}, we are also able to explore the transition from minimal rules to composed ones in a principled manner by encoding contextual information through the latent states. +Thus, a beneficial side effect of our refinement process is the creation of more context-specific rules without increasing the overall size of the grammar. + + +\subsection{Decoding} +\label{sec:decoding} +\begin{figure}[h!] + \begin{footnotesize} + \framebox{\parbox{\columnwidth}{ + {\bf Inputs:} Sentence $f_1 \ldots f_N$, L-SCFG $(\n, S, m, n)$, parameters $C^r \in \reals^{(m \times m \times m)}$, $\in \reals^{(m \times m)}$, or $\in \reals^{(1 \times m)}$ for all $r \in \rules$, $C^\bS \in \reals^{(m \times 1)}$, hypergraph $\mathcal{H}$. + + {\bf Data structures:} + + For each node $q \in \mathcal{H}$: + \begin{itemize}[noitemsep] + \item $\balpha(q) \in \reals^{1 \times m}$ is a row vector of inside terms. + \item $\bbeta(q) \in \reals^{m \times 1}$ is a column vector of outside terms. + \item For each incoming edge $e \in {\bf B}(q)$ to node $q$, $\mu(e)$ is a marginal probability for edge (rule) $e$. + \end{itemize} + + {\bf Algorithm:} + + (Inside Computation) + %(Inside base case) $\forall i \in [N], \;\; \alpha^{X, i, i} = \sum_{r \in \bX \rightarrow f_i} C^r$ + + For nodes $q$ in topological order in $\mathcal{H}$, + \begin{itemize}[label={},nolistsep] + \item $\balpha(q) = \bm{0}$ + \item For each incoming edge $e \in {\bf B}(q)$, + \item \begin{itemize}[label={}] + \item tail = {\bf t}(e), rule = {\bf r}(e) + \item if $|$tail$| = 0$, then $\balpha(q) = \balpha(q) + C^{\textrm{rule}}$ + \item else if $|$tail$| = 1$, then $\balpha(q) = \balpha(q) + C^{\textrm{rule}} \times_1 \balpha(\textrm{tail}_0)$ + \item else if $|$tail$| = 2$, then $\balpha(q) = \balpha(q) + C^{\textrm{rule}} \times_2 \balpha(\textrm{tail}_1) \times_1 \balpha(\textrm{tail}_0)$ + \end{itemize} + \end{itemize} + + + (Outside Computation) + + For $q \in \mathcal{H}$, + \begin{itemize}[label={},nolistsep] + \item $\bbeta(q) = \bm{0}$ + \end{itemize} + $\bbeta(\textrm{goal}) = C^\bS$ + + For $q$ in reverse topological order in $\mathcal{H}$, + \begin{itemize}[label={},nolistsep] + \item For each incoming edge $e \in {\bf B}(q)$, + \item \begin{itemize}[label={}] + \item tail = {\bf t}(e), rule = {\bf r}(e) + \item if $|$tail$| = 1$, then $\bbeta(\textrm{tail}_0) = \bbeta(q) \times_0 C^{\textrm{rule}}$ + \item else if $|$tail$| = 2$, then, + \begin{itemize}[label={}] + \item $\bbeta(\textrm{tail}_0) = \bbeta(q) \times_0 C^{\textrm{rule}} \times_2 \balpha(\textrm{tail}_1)$ + \item $\bbeta(\textrm{tail}_1) = \bbeta(q) \times_0 C^{\textrm{rule}} \times_1 \balpha(\textrm{tail}_0)$ + \end{itemize} + + \end{itemize} + \end{itemize} + + + \hbox{(Marginals)} + Sentence probability $g = \balpha(\textrm{goal}) \times \bbeta(\textrm{goal})$ + For edge $e \in \mathcal{H}$, + \begin{itemize}[label={},nolistsep] + \item head = {\bf h}(e), tail = {\bf t}(e), rule = {\bf r}(e) + \item if $\mu(e) = (\bbeta(\textrm{head}) \times_0 C^{\textrm{rule}}) / g$ + \item else if $|$tail$| = 2$, then $\mu(e) = (\bbeta(\textrm{head}) \times_0 C^{\textrm{rule}} \times_2 \balpha(\textrm{tail}_1) \times_1 \balpha(\textrm{tail}_0) / g$ + \item else if $|$tail$| = 1$, then $\mu(e) = (\bbeta(\textrm{head}) \times_0 C^{\textrm{rule}} \times_1 \balpha(\textrm{tail}_0)) / g$ + \end{itemize} +}} +\end{footnotesize} +\caption{The tensor form of the hypergraph inside-outside algorithm, for calculation of rule marginals $\mu(e)$. +A slight simplification in the marginal computation yields NT marginals for spans $\mu(\bX, i, j)$. +{\bf B}(q) returns the incoming hyperedges for node $q$, and {\bf h}(e), {\bf t}(e), {\bf r}(e) return the head node, tail nodes, and rule for hyperedge $e$.} +\vspace{-1cm} +\label{fig:hg_io_spec} +\end{figure} +For a parameter $t$ of rule $r$, the latent state $h_1$ attached to the LHS NT of $r$ is associated with the outside tree for the sub-tree rooted at the LHS, and the states attached to the RHS NTs are associated with the inside trees of that NT. +Since we do not assume conditional independence of these states, we need to consider all possible interactions, which can be compactly represented as a 3\ts{rd}-order tensor in the case of a binary rule, a matrix (i.e., a 2\ts{nd}-order tensor) for unary rules, and a vector for pre-terminal (lexical) rules. +Preferences for certain outside-inside tree combinations are reflected in the values contained in these tensor structures. +In this manner, we intend to capture interactions between non-local context, as represented by the outside tree, and local context, through the inside trees. +We refer to these tensor structures collectively as $C^r$ for rules $r \in \rules$, which encompass the parameters $t$. + +For $r \in \rules_0: C^r \in \reals^{1 \times m}$; similarly for $r \in \rules_1: C^r \in \reals^{m \times m}$ and $r \in \rules_2: C^r \in \reals^{m \times m \times m}$. +We also maintain a vector $C^\bS \in \reals^{m \times 1}$ corresponding to the parameters $\pi(\bS, h)$ for the goal node (root). +These parameters participate in tensor-vector operations: a 3\ts{rd}-order tensor $C^{r_2}$ can be multiplied along each of its three modes ($\times_0, \times_1, \times_2$), and if multiplied by an $m \times 1$ vector, will produce an $m \times m$ matrix.\footnote{This operation is sometimes called a contraction.} +Note that matrix multiplication can be represented by $\times_1$ when multiplying on the right and $\times_0$ when multiplying on the left of the matrix. + +The decoder computes probabilities for each rule in the parse forest of a source sentence by marginalizing over the latent states, which in practice corresponds to simple tensor-vector products, and is not dependent on the manner in which the parameters were estimated. +Figure \ref{fig:hg_io_spec} presents the tensor version of the inside-outside algorithm for decoding L-SCFGs. +The algorithm takes as input the parse forest of the source sentence represented as a hypergraph \cite{Klein2001}, which is computed using a bottom-up parser with Earley-style rules (citation), similar to the CKY+ algorithm used in \newcite{Chiang2007}. +Then, the algorithm computes inside and outside probabilities over the hypergraph using the tensor representations, and converts these probabilities to marginal rule probabilities. +It is similar to the version presented in \newcite{Cohen2012a}, but adapted to hypergraph parse forests. + +The algorithm maintains its $\mathcal{O}(n^3|G|)$ complexity where $n$ is the length of the input sentence and $|G|$ is the size of the grammar; we do not increase the number of rules at all, so the grammar size is the same. +But of course, there is no free lunch, and the additional computation gets shifted to the marginalization over latent states via the algorithm in Figure \ref{fig:hg_io_spec}. +However, the bulk of the computation in this case is in the form of a series of tensor-vector products of relatively small size (each dimension is of length $m$), which can be computed very quickly and in parallel. + +\section{Parameter Estimation for L-SCFGs} +\label{sec:estimation} +We explore two methods for estimating the parameters $C^r$ of the model: a likelihood-maximization approach based on EM \cite{Dempster1977}, and a spectral approach based on the method of moments \cite{Hsu2009}, where we identify a subspace using a singular value decomposition (SVD) \cite{Golub1996} of the cross-product feature space between inside and outside trees and estimate parameters in this subspace. + +Figure \ref{fig:estimation-algos} presents a side-by-side comparison of the two algorithms, which we discuss in this section. +In the spectral approach, we base our parameter estimates on low-rank representations of moments of features, while EM explicitly maximizes a likelihood criterion. +The parameter estimation algorithms are relatively similar, but in lieu of sparse feature functions in the spectral case, EM used partial counts estimated with the current set of parameters. +The nature of EM allows it to be susceptible to local optima, while the spectral approach comes with guarantees on obtaining the global optimum. +Lastly, computing the SVD and estimating parameters in the low-rank space is a one-shot operation, as opposed to the iterative procedure of EM. + +\begin{figure*}[t!] + \centering + \fbox{ + \begin{footnotesize} + \begin{subfigure}{0.85\columnwidth} + \vspace{-1cm} + {\bf Inputs:} + + Training examples $(\rione, \xione, \xitwo, \xithree, \oi, b^{(i)})$ for $i \in \{1 \ldots M\}$, where $\rione$ is a context free rule; $\xione$, $\xitwo$, and $\xithree$ are inside trees; $\oi$ is an outside tree; and $b^{(i)} = 1$ if the rule is at the root of tree, $0$ otherwise. +A function $\phi$ that maps inside trees $t$ to feature-vectors $\phi(t) \in \reals^d$. A function $\psi$ that maps outside trees $o$ to feature-vectors $\psi(o) \in \reals^{d'}$. + + {\bf Algorithm:} + %If $\rione$ is of the form $\srule$, define $b_i$ to be the non-terminal for the left-child of $\rione$, and $c_i$ to be the non-terminal for the right-child. + + (Step 0: Singular Value Decomposition) + \begin{itemize} + \item Compute the SVD of Eq.~\ref{eq:outerproduct} to calculate matrices $\e{U} \in \reals^{(d \times m)}$ and $\e{V} \in \reals^{(d' \times m)}$. + \end{itemize} + + (Step 1: Projection) + \begin{align*} + Y(t) &= U^T \phi(t)\\ + Z(o) &= \Sigma^{-1} V^T \psi(o) + \end{align*} + + (Step 2: Calculate Correlations) + \begin{align*} + \e{E}^r &= \begin{cases} + \frac{\sum_{o \in Q^r} Z(o)}{|Q^r|} & \textrm{if }r \in \rules_0 \\ + \frac{\sum_{\left(o, t\right) \in Q^r} Z(o) \otimes Y(t)}{|Q^r|} & \textrm{if }r \in \rules_1 \\ + \frac{\sum_{\left(o, t^2, t^3\right) \in Q^r} Z(o) \otimes Y(t^2) \otimes Y(t^3)}{|Q^r|} & \textrm{if }r \in \rules_2 + \end{cases} + \end{align*} + $Q^r$ is the set of outside-inside tree triples for binary rules, outside-inside tree pairs for unary rules, and outside trees for pre-terminals. + + (Step 3: Compute Final Parameters) + \begin{itemize} + \item For all $r \in \rules$, + \begin{itemize}[label={}] + \item $\e{C}^r = \frac{\textrm{count}(r)}{M} \times \e{E}^r$ + \end{itemize} + \item For all $\rione \in \{1, \dots, M\}$ such that $b^{(i)}$ is 1, + \begin{itemize}[label={}] + \item $\e{C}^\bS = \e{C}^\bS + \frac{Y(\xione)}{|Q^\bS|} $ + \end{itemize} + \end{itemize} + $Q^\bS$ is the set of trees at the root. + \caption{\small The spectral learning algorithm for estimating parameters of an L-SCFG.} + \label{fig:splearn} + \end{subfigure} + %& + \begin{subfigure}{1.05\columnwidth} + {\bf Inputs:} + + Training examples $(\rione, \xione, \xitwo, \xithree, \oi, b^{(i)})$ for $i \in \{1 \ldots M\}$, where $\rione$ is a context free rule; $\xione$, $\xitwo$, and $\xithree$ are inside trees; $\oi$ is an outside tree; $b^{(i)} = 1$ if the rule is at the root of tree, $0$ otherwise; and MAX\_ITERATIONS. +%A function $\phi$ that maps inside trees $t$ to feature-vectors $\phi(t) \in \reals^d$. A function $\psi$ that maps outside trees $o$ to feature-vectors $\psi(o) \in \reals^{d'}$. + + {\bf Algorithm:} + %If $\rione$ is of the form $\srule$, define $b_i$ to be the non-terminal for the left-child of $\rione$, and $c_i$ to be the non-terminal for the right-child. + + (Step 0: Parameter Initialization) + + For rule $r \in \rules$, + \begin{itemize}[noitemsep] + \item if $r \in \rules_0$: initialize $\e{C}^r \in \reals^{1 \times m}$ + \item if $r \in \rules_1$: initialize $\e{C}^r \reals^{m \times m}$ + \item if $r \in \rules_2$: initialize $\e{C}^r \reals^{m \times m \times m}$ + \end{itemize} + + Initialize $\e{C}^\bS \in \reals^{m \times 1}$ + + $\e{C}_0^r = \e{C}^r, \e{C}_0^\bS = \e{C}^\bS$ + + For iteration $t=1, \dots, \textrm{MAX\_ITERATIONS}$, + \begin{itemize} + \item Expectation Step: + \begin{itemize}[label={}] + \item (Estimate $Y$ and $Z$) + + Compute partial counts and total tree probabilities $g$ for all $t$ and $o$ using Fig.~\ref{fig:hg_io_spec} and parameters $\e{C}_{t-1}^r, \e{C}_{t-1}^\bS$. + \item (Calculate Correlations) + \begin{align*} + \e{E}^r &= \begin{cases} + \sum\limits_{o, g \in Q^r} \frac{Z(o)}{g} &\textrm{if }r \in \rules_0 \\ + \sum\limits_{\left(o, t, g\right) \in Q^r} \frac{Z(o) \otimes Y(t)}{g} &\textrm{if }r \in \rules_1 \\ + \sum\limits_{\left(o,t^2,t^3,g\right) \in Q^r} \frac{Z(o) \otimes Y(t^2) \otimes Y(t^3)}{g} &\textrm{if }r \in \rules_2 + \end{cases} + \end{align*} + \item (Update Parameters) + \begin{itemize}[label={}] + \item For all $r \in \rules$, $\e{C}^r_t = \e{C}^r_{t-1} \odot \e{E}^r$ + \item For all $\rione \in \{1, \dots, M\}$ such that $b^{(i)}$ is 1, $\e{C}^\bS_t = \e{C}^\bS_t + (\e{C}^\bS_{t-1} \odot Y(\rione)) / g $ + \end{itemize} + $Q^\bS$ is the set of trees at the root. + \end{itemize} + \item Maximization Step + \begin{itemize}[label={},nolistsep]%[nolistsep] + \item if $r \in \rules_0$: $\forall h_1: \e{C}^r(h_1) = \frac{\e{C}^r(h_1)}{\sum_{h_1}\e{C}^r(h_1)}$ + \item if $r \in \rules_1$: $\forall h_1, h_2: \e{C}^r(h_1, h_2) = \frac{\e{C}^r(h_1, h_2)}{\sum_{h_2}\e{C}^r(h_1, h_2)}$ + \item if $r \in \rules_2$: $\forall h_1, h_2, h_3: \e{C}^r(h_1, h_2, h_3) = \frac{\e{C}^r(h_1, h_2, h_3)}{\sum_{h_2, h_3}\e{C}^r(h_1, h_2, h_3)}$ + \end{itemize} + \end{itemize} + \caption{\small The EM-based algorithm for estimating parameters of an L-SCFG.} + \label{fig:emlearn} + \end{subfigure} + \end{footnotesize}} + \caption{The two parameter estimation algorithms proposed for L-SCFGs.} + \label{fig:estimation-algos} +\end{figure*} + +\subsection{Spectral Moments-based Estimation} +\label{sec:spectral} +We generalize the parameter estimation algorithm presented in \newcite{Cohen2013} to the synchronous or bilingual case. +The central concept of the spectral parameter estimation algorithm is to learn an $m$-dimensional representation of inside and outside trees by defining these trees in terms of features, in combination with a projection step (SVD), with the hope being that the lower-dimensional space captures the syntactic and semantic regularities among rules from the sparse feature space. +%The spectral method relies on computing the empirical covariances between two feature spaces, represented by their respective feature functions that map tree fragments to feature vectors. +Every NT in an s-tree has an associated inside and outside tree; the inside tree contains the entire sub-tree at and below the NT, and the outside tree is everything else in the synchronous s-tree except the inside tree. +The inside feature function $\phi \in \mathbb{R}^d$ maps the domain of inside tree fragments to a $d$-dimensional Euclidean space, and the outside feature function $\psi \in \mathbb{R}^{d'}$ maps the domain of outside tree fragments to a $d'$-dimensional space. +The specific features we used are discussed in \S\ref{sec:features}. + +Let $\mathcal{O}$ be the set of all tuples of inside-outside trees in our training corpus, whose size is equivalent to the number of rule tokens $M$, and let $\phi(t) \in \reals^{d \times 1}$, $\psi(o) \in \reals^{d' \times 1}$ be the inside and outside feature functions. +By computing the outer product $\otimes$ between the inside and outside feature vectors for each pair and aggregating, we obtain the empirical inside-outside feature covariance matrix: +\begin{align} + \hat{\Omega} = \frac{1}{|\mathcal{O}|} \sum_{(o,t) \in \mathcal{O}} \phi(t) \left(\psi(o)\right)^T + \label{eq:outerproduct} +\end{align} +If $m$ is the desired latent space dimension, we compute an $m$-rank truncated SVD of the empirical covariance matrix $\hat{\Omega} = U \Sigma V^T$, where $U \in \mathbb{R}^{d \times m}$ and $V \in \mathbb{R}^{d' \times m}$ are the matrices containing the left and right singular vectors, and $\Sigma \in \mathbb{R}^{d \times d'}$ is a diagonal matrix containing the $m$-largest singular values along its diagonal. + +Figure \ref{fig:splearn} provides the remaining steps in the algorithm. +In step 1, for each inside and outside tree, we project its high-dimensional representation to the latent space. +Using the lower-dimensional representations for inside and outside trees, in step 2 for each rule type $r$ we compute the covariance between the inside tree vectors and the outside tree vector using the \emph{tensor product}, a generalized outer product to compute covariances between more than two random vectors. +For binary rules, with two child inside vectors and one outside vector, the result $\e{E}^r$ is a 3-mode tensor; for unary rules, a regular matrix, and for pre-terminal rules with no right-hand side non-terminals, a vector. +The final parameter estimate is then the associated tensor/matrix/vector, scaled by the maximum likelihood estimate of the rule $r$, as in step 3. + +The corresponding theoretical guarantees from \newcite{Cohen2012a} can also be generalized to the synchronous case trivially. +$\hat{\Omega}$ is an empirical estimate of the true covariance matrix $\Omega$, and if $\Omega$ has rank $m$, then the marginals computed using the spectrally-estimated parameters will converge to the true marginals. +The sample complexity for convergence is inversely proportional to the $m^{\textrm{th}}$ largest singular value. + +\subsection{EM-based Estimation} +\label{sec:em} +A likelihood maximization approach can also be used to learn the parameters of an L-SCFG. +Parameters are initialized by sampling each parameter value $\e{C}^r(h_1, h_2, h_3)$ from the interval $[0,1]$ uniformly at random.\footnote{In our experiments, we also tried the initialization scheme described in \newcite{Matsuzaki2005}, but found that it provided little benefit.} +We first decode the training corpus using an existing set of parameters to compute the inside and outside probability vectors associated with NTs for every rule in each s-tree, constrained to the tree structure of the training example. +These probabilities can be computed using the decoding algorithm in Figure \ref{fig:hg_io_spec} (where $\balpha$ and $\bbeta$ correspond to the inside and outside probabilities respectively), except the parse forest consists of a single tree only. +Each of these vectors represents partial counts over latent states. +We can then define functions $Y$ and $Z$ (analogous to the spectral case) which map inside and outside tree instances to $m$-dimensional vectors containing these partial counts. +In the spectral case, $Y$ and $Z$ are estimated just once, while in the case of EM they have to be re-estimated at each iteration. + +The expectation step thus consists of computing the partial counts of inside and outside trees $t$ and $o$, i.e., recovering the functions $Y$ and $Z$, and updating parameters $C^r$ by computing correlations, which involves summing over partial counts (across all occurrences of a rule in the corpus). +Each partial count's contribution is divided by a normalization factor $g$, which is the total probability of the tree which $t$ or $o$ is part of. +Note that unlike the spectral case, there is a specific normalization factor for each inside-outside tuple. +Lastly, the correlations are scaled by the existing parameter estimates. +To obtain the next set of parameters, in the maximization step we normalize $\e{C}^r$ for $r \in \rules$ such that for every $h_1, \sum_{h_2,h_3} \e{C}^r(h_1, h_2, h_3) = 1$ for $r \in \rules_2$, $\sum_{h_2} \e{C}^r(h_1, h_2) = 1$ for $r \in \rules_1$, and $\sum_{h_2} \e{C}^r(h_2) = 1$ for $r \in \rules_0$. +We note that it is also possible to add sparse, overlapping features to an EM-based estimation procedure \cite{Berg-Kirkpatrick2010} and leave this for future work. + +\section{Evaluation} +To evaluate the performance of L-SCFGs in a translation setting, we looked at several experiments across two language pairs. +The primary criterion of evaluation was BLEU \cite{Papineni2002}, and we evaluate our latent variable model against a number of baselines to elucidate its performance. +The latent variable model is integrated into the standard MT pipeline by computing marginal probabilities for each rule in the parse forest of a source sentence using the algorithm in Figure \ref{fig:hg_io_spec} with the parameters estimated through the algorithms in Figure \ref{fig:estimation-algos}, and is added as a feature for the rule during MERT \cite{Och2003}. +These probabilities are conditioned on the LHS (\bX), and are thus joint probabilities for a source-target RHS pair. +We also write out as features the conditional probabilities $P(e|f)$ and $P(f|e)$ as estimated by our latent variable model, i.e., conditioned on the source and target RHS. + +\subsection{Data and Baselines} +\label{sec:data} +The \textsc{DE-EN} parallel corpus is taken from the news commentary section of the WMT 2012 translation evaluation; \textsc{news-test2010} is used as the development set, and \textsc{news-test2011} is the test set.\footnote{http://www.statmt.org/wmt12/} +The development and test sets are evaluated with a single reference. +The \textsc{ZH-EN} data is the BTEC parallel corpus \cite{Paul2009}; we combine the first and second development sets in one, and evaluate on the third development set. +The development and test sets are evaluated with 16 references. +Statistics for the data are shown in Table \ref{tab:corpusstats}. +We used the \textsc{cdec} decoder \cite{Dyer2010} to extract word alignments and the baseline hierarchical grammars, for MERT tuning, and decoding. +%For the in-sample conditional perplexity experiments, we used a 4-gram language model . +We used a 4-gram language model built from the target-side of the parallel training data. +\begin{table}[h!] +%{\small + \begin{center} + \begin{tabular}{p{0.5\linewidth}rr} + \hline + & \textsc{DE-EN} & \textsc{ZH-EN} \\ + \hline + TRAIN (SRC) & 3.7M & 334K \\ + TRAIN (TGT) & 3.6M & 366K \\ + DEV (SRC) & 65K & 7K \\ + DEV (TGT) & 63K & 7.6K\\ + TEST (SRC) & 63K & 3.8K \\ + TEST (TGT) & 65K & 3.9K \\ + \end{tabular} + \end{center} + \caption{Corpus statistics (in words). For the \textsc{ZH-EN} target DEV and TEST statistics, we take the first reference.} + \label{tab:corpusstats} + %} +\end{table} + +The baseline \textsc{hiero} system uses a grammar extracted by applying the commonly used heuristics \cite{Chiang2005}. +Each rule is decorated with two lexical and phrasal features corresponding to the forward $P(e|f)$ and backward $P(f|e)$ probabilities, along with the joint probability $P(e,f)$, the marginal probability of the source phrase $P(f)$, and whether the phrase pair or the source phrase is a singleton. +Weights for the language model (and language model OOV), glue rule, and word penalty are also tuned. +The minimal grammar maintains the same set of weights. + +\subsection{Features} +\label{sec:features} +We use the following set of sparse, binary features in the spectral learning process: +\begin{itemize}[noitemsep] + \item Rule Indicator: for the inside features, we consider the rule production containing the current non-terminal on the left-hand side, as well as the rules of the children (distinguishing between left and right children for binary rules). + For the outside features, we consider the parent rule production along with the rule production of the sibling (if it exists). + \item Lexical: for both the inside and outside features, any lexical items that appear in the rule productions are recorded. + Furthermore, we consider the first and last words of spans (left and right child spans for inside features, distinguishing between the two if both exist, and sibling span for outside features). + Source and target words are treated separately. + %\item Arity: the number of non-terminals present in inside tree and outside tree rules. + \item Length: the span length of the tree and each of its children for inside features, and the span length of the parent and sibling for outside features. +\end{itemize} +In addition to the sparse features, we also investigate the inclusion of real-valued features that are traditionally used in MT, e.g., lexical and phrasal forward and reverse probabilities. + +\subsection{\textsc{DE-EN} Experiments} + +Table \ref{tab:de-en-results} presents a comprehensive evaluation of the \textsc{DE-EN} experimental setup. +The first section consists of the various baselines we consider. +In addition to the standard HPBT setup \cite{Chiang2005}, we evaluate the minimal grammar baseline with the same set of features, as well as a setup where the spectral parameters simply consist of the joint maximum likelihood estimates of the rules. +This baseline, along with the $m=1$ spectral baseline with only rule indicator features, should perform \emph{en par} with the minimal grammar baseline, which we see is the case. +Furthermore, in line with previous work \cite{Galley2006} which compares minimal and composed rules, we find that minimal grammars take a hit of almost 1.5 BLEU points compared to composed (\textsc{hiero}) grammars. + +We look at a number of feature combinations and latent states for the spectral and EM-estimated latent variable models. + +The two estimation algorithms differ significantly in their estimation time. +The spectral algorithm is an at least an order of magnitude faster: it completes within 40 minutes on a single core, while a parallelized EM implementation would take around 100 iterations to achieve this level of performance, taking more than 10 hours. + +\begin{table}[t!] +\begin{small} + \begin{center} + \begin{tabular}{|l|p{0.45\columnwidth}rr|} + \hline + & & \multicolumn{2}{c|}{\bf BLEU} \\ + & Setup & Dev & Test \\ + \hline + \multirow{3}{*}{Baselines} & \textsc{hiero} & 18.50 & 16.89 \\ + & Minimal Grammar & 17.01 & 15.42 \\ + & MLE & X & Y \\ \hline + \multirow{4}{*}{Spectral} & $m=1$ RI & 17.09 & 15.34 \\ + & $m=1$ RI+Lex+Len & X & Y \\ + & $m=16$ RI+Lex+Len & X & Y \\ + & $m=16$ RI+Lex+Len+Sm & X & Y \\ \hline + \multirow{2}{*}{EM} & $m=1$ 100 Iter & X & Y \\ + & $m=16$ 100 Iter & X & Y \\ + \hline + \end{tabular} + \end{center} + \caption{Results for the \textsc{DE-EN} corpus, comparing across the baselines and the two parameter estimation techniques. + RI, Lex, and Len correspond to the rule indicator, lexical, and length features respectively, and Sm denotes smoothing.} + \label{tab:de-en-results} +\end{small} +\end{table} +\subsection{\textsc{ZH-EN} Experiments} + +\subsection{Discussion \& Analysis} + +\section{Related Work} + +\section{Conclusion} + +In this work, we presented a scalable approach to refine synchronous grammars used in MT by inferring the latent categories for each non-terminal in our grammar rules. + +For future work, we would like to consider a more direct way to integrate the latent variable parameters in an MT setup. + +% include your own bib file like this: +\bibliographystyle{acl} +\bibliography{bibliography} + +\end{document} diff --git a/EMNLP2014/spectral_scfgs.tex b/EMNLP2014/spectral_scfgs.tex deleted file mode 100644 index 5862361..0000000 --- a/EMNLP2014/spectral_scfgs.tex +++ /dev/null @@ -1,431 +0,0 @@ -% -% File acl2014.tex -% -% Contact: koller@ling.uni-potsdam.de, yusuke@nii.ac.jp -%% -%% Based on the style files for ACL-2013, which were, in turn, -%% Based on the style files for ACL-2012, which were, in turn, -%% based on the style files for ACL-2011, which were, in turn, -%% based on the style files for ACL-2010, which were, in turn, -%% based on the style files for ACL-IJCNLP-2009, which were, in turn, -%% based on the style files for EACL-2009 and IJCNLP-2008... - -%% Based on the style files for EACL 2006 by -%%e.agirre@ehu.es or Sergi.Balari@uab.es -%% and that of ACL 08 by Joakim Nivre and Noah Smith - -\documentclass[11pt]{article} -\usepackage{acl2014} -\usepackage{times} -\usepackage{url} -\usepackage{latexsym} - -%\setlength\titlebox{5cm} - -% You can expand the titlebox if you need extra space -% to show all the authors. Please do not make the titlebox -% smaller than 5cm (the original size); we will check this -% in the camera-ready version and ask you to change it back. - - -\title{Instructions for ACL-2014 Proceedings} - -\author{First Author \\ - Affiliation / Address line 1 \\ - Affiliation / Address line 2 \\ - Affiliation / Address line 3 \\ - {\tt email@domain} \\\And - Second Author \\ - Affiliation / Address line 1 \\ - Affiliation / Address line 2 \\ - Affiliation / Address line 3 \\ - {\tt email@domain} \\} - -\date{} - -\begin{document} -\maketitle -\begin{abstract} - This document contains the instructions for preparing a camera-ready - manuscript for the proceedings of ACL-2014. The document itself - conforms to its own specifications, and is therefore an example of - what your manuscript should look like. These instructions should be - used for both papers submitted for review and for final versions of - accepted papers. Authors are asked to conform to all the directions - reported in this document. -\end{abstract} - -\section{Credits} - -This document has been adapted from the instructions for earlier ACL -proceedings, including those for ACL-2012 by Maggie Li and Michael -White, those from ACL-2010 by Jing-Shing Chang and Philipp Koehn, -those for ACL-2008 by Johanna D. Moore, Simone Teufel, James Allan, -and Sadaoki Furui, those for ACL-2005 by Hwee Tou Ng and Kemal -Oflazer, those for ACL-2002 by Eugene Charniak and Dekang Lin, and -earlier ACL and EACL formats. Those versions were written by several -people, including John Chen, Henry S. Thompson and Donald -Walker. Additional elements were taken from the formatting -instructions of the {\em International Joint Conference on Artificial - Intelligence}. - -\section{Introduction} - -The following instructions are directed to authors of papers submitted -to ACL-2014 or accepted for publication in its proceedings. All -authors are required to adhere to these specifications. Authors are -required to provide a Portable Document Format (PDF) version of their -papers. \textbf{The proceedings are designed for printing on A4 - paper.} - -Authors from countries in which access to word-processing systems is -limited should contact the publication chairs, Alexander Koller -(\texttt{koller@ling.uni-potsdam.de}) and Yusuke Miyao -(\texttt{yusuke@nii.ac.jp}), as soon as possible. - -We will make more detailed instructions available at -\url{http://sites.google.com/site/acl2014publication}. Please check -this website regularly. - - -\section{General Instructions} - -Manuscripts must be in two-column format. Exceptions to the -two-column format include the title, authors' names and complete -addresses, which must be centered at the top of the first page, and -any full-width figures or tables (see the guidelines in -Subsection~\ref{ssec:first}). {\bf Type single-spaced.} Start all -pages directly under the top margin. See the guidelines later -regarding formatting the first page. The manuscript should be -printed single-sided and its length -should not exceed the maximum page limit described in Section~\ref{sec:length}. -Do not number the pages. - - -\subsection{Electronically-available resources} - -We strongly prefer that you prepare your PDF files using \LaTeX\ with -the official ACL 2014 style file (acl2014.sty) and bibliography style -(acl.bst). These files are available at -\url{http://www.cs.jhu.edu/ACL2014/}. You will also find the document -you are currently reading (acl2014.pdf) and its \LaTeX\ source code -(acl2014.tex) on this website. - -You can alternatively use Microsoft Word to produce your PDF file. In -this case, we strongly recommend the use of the Word template file -(acl2014.dot) on the ACL 2014 website. If you have an option, we -recommend that you use the \LaTeX2e version. If you will be - using the Microsoft Word template, we suggest that you anonymize - your source file so that the pdf produced does not retain your - identity. This can be done by removing any personal information -from your source document properties. - - - -\subsection{Format of Electronic Manuscript} -\label{sect:pdf} - -For the production of the electronic manuscript you must use Adobe's -Portable Document Format (PDF). PDF files are usually produced from -\LaTeX\ using the \textit{pdflatex} command. If your version of -\LaTeX\ produces Postscript files, you can convert these into PDF -using \textit{ps2pdf} or \textit{dvipdf}. On Windows, you can also use -Adobe Distiller to generate PDF. - -Please make sure that your PDF file includes all the necessary fonts -(especially tree diagrams, symbols, and fonts with Asian -characters). When you print or create the PDF file, there is usually -an option in your printer setup to include none, all or just -non-standard fonts. Please make sure that you select the option of -including ALL the fonts. \textbf{Before sending it, test your PDF by - printing it from a computer different from the one where it was - created.} Moreover, some word processors may generate very large PDF -files, where each page is rendered as an image. Such images may -reproduce poorly. In this case, try alternative ways to obtain the -PDF. One way on some systems is to install a driver for a postscript -printer, send your document to the printer specifying ``Output to a -file'', then convert the file to PDF. - -It is of utmost importance to specify the \textbf{A4 format} (21 cm -x 29.7 cm) when formatting the paper. When working with -{\tt dvips}, for instance, one should specify {\tt -t a4}. - -Print-outs of the PDF file on A4 paper should be identical to the -hardcopy version. If you cannot meet the above requirements about the -production of your electronic submission, please contact the -publication chairs as soon as possible. - - -\subsection{Layout} -\label{ssec:layout} - -Format manuscripts two columns to a page, in the manner these -instructions are formatted. The exact dimensions for a page on A4 -paper are: - -\begin{itemize} -\item Left and right margins: 2.5 cm -\item Top margin: 2.5 cm -\item Bottom margin: 2.5 cm -\item Column width: 7.7 cm -\item Column height: 24.7 cm -\item Gap between columns: 0.6 cm -\end{itemize} - -\noindent Papers should not be submitted on any other paper size. - If you cannot meet the above requirements about the production of your electronic submission, please contact the publication chairs above as soon as possible. - - -\subsection{Fonts} - -For reasons of uniformity, Adobe's {\bf Times Roman} font should be -used. In \LaTeX2e{} this is accomplished by putting - -\begin{quote} -\begin{verbatim} -\usepackage{times} -\usepackage{latexsym} -\end{verbatim} -\end{quote} -in the preamble. If Times Roman is unavailable, use {\bf Computer - Modern Roman} (\LaTeX2e{}'s default). Note that the latter is about - 10\% less dense than Adobe's Times Roman font. - - -\begin{table}[h] -\begin{center} -\begin{tabular}{|l|rl|} -\hline \bf Type of Text & \bf Font Size & \bf Style \\ \hline -paper title & 15 pt & bold \\ -author names & 12 pt & bold \\ -author affiliation & 12 pt & \\ -the word ``Abstract'' & 12 pt & bold \\ -section titles & 12 pt & bold \\ -document text & 11 pt &\\ -captions & 11 pt & \\ -abstract text & 10 pt & \\ -bibliography & 10 pt & \\ -footnotes & 9 pt & \\ -\hline -\end{tabular} -\end{center} -\caption{\label{font-table} Font guide. } -\end{table} - -\subsection{The First Page} -\label{ssec:first} - -Center the title, author's name(s) and affiliation(s) across both -columns. Do not use footnotes for affiliations. Do not include the -paper ID number assigned during the submission process. Use the -two-column format only when you begin the abstract. - -{\bf Title}: Place the title centered at the top of the first page, in -a 15-point bold font. (For a complete guide to font sizes and styles, -see Table~\ref{font-table}) Long titles should be typed on two lines -without a blank line intervening. Approximately, put the title at 2.5 -cm from the top of the page, followed by a blank line, then the -author's names(s), and the affiliation on the following line. Do not -use only initials for given names (middle initials are allowed). Do -not format surnames in all capitals (e.g., use ``Schlangen'' not -``SCHLANGEN''). Do not format title and section headings in all -capitals as well except for proper names (such as ``BLEU'') that are -conventionally in all capitals. The affiliation should contain the -author's complete address, and if possible, an electronic mail -address. Start the body of the first page 7.5 cm from the top of the -page. - -The title, author names and addresses should be completely identical -to those entered to the electronical paper submission website in order -to maintain the consistency of author information among all -publications of the conference. If they are different, the publication -chairs may resolve the difference without consulting with you; so it -is in your own interest to double-check that the information is -consistent. - -{\bf Abstract}: Type the abstract at the beginning of the first -column. The width of the abstract text should be smaller than the -width of the columns for the text in the body of the paper by about -0.6 cm on each side. Center the word {\bf Abstract} in a 12 point bold -font above the body of the abstract. The abstract should be a concise -summary of the general thesis and conclusions of the paper. It should -be no longer than 200 words. The abstract text should be in 10 point font. - -{\bf Text}: Begin typing the main body of the text immediately after -the abstract, observing the two-column format as shown in -the present document. Do not include page numbers. - -{\bf Indent} when starting a new paragraph. Use 11 points for text and -subsection headings, 12 points for section headings and 15 points for -the title. - -\subsection{Sections} - -{\bf Headings}: Type and label section and subsection headings in the -style shown on the present document. Use numbered sections (Arabic -numerals) in order to facilitate cross references. Number subsections -with the section number and the subsection number separated by a dot, -in Arabic numerals. Do not number subsubsections. - -{\bf Citations}: Citations within the text appear in parentheses -as~\cite{Gusfield:97} or, if the author's name appears in the text -itself, as Gusfield~\shortcite{Gusfield:97}. Append lowercase letters -to the year in cases of ambiguity. Treat double authors as -in~\cite{Aho:72}, but write as in~\cite{Chandra:81} when more than two -authors are involved. Collapse multiple citations as -in~\cite{Gusfield:97,Aho:72}. Also refrain from using full citations -as sentence constituents. We suggest that instead of -\begin{quote} - ``\cite{Gusfield:97} showed that ...'' -\end{quote} -you use -\begin{quote} -``Gusfield \shortcite{Gusfield:97} showed that ...'' -\end{quote} - -If you are using the provided \LaTeX{} and Bib\TeX{} style files, you -can use the command \verb|\newcite| to get ``author (year)'' citations. - -As reviewing will be double-blind, the submitted version of the papers -should not include the authors' names and affiliations. Furthermore, -self-references that reveal the author's identity, e.g., -\begin{quote} -``We previously showed \cite{Gusfield:97} ...'' -\end{quote} -should be avoided. Instead, use citations such as -\begin{quote} -``Gusfield \shortcite{Gusfield:97} -previously showed ... '' -\end{quote} - -\textbf{Please do not use anonymous citations} and do not include -acknowledgements when submitting your papers. Papers that do not -conform to these requirements may be rejected without review. - -\textbf{References}: Gather the full set of references together under -the heading {\bf References}; place the section before any Appendices, -unless they contain references. Arrange the references alphabetically -by first author, rather than by order of occurrence in the text. -Provide as complete a citation as possible, using a consistent format, -such as the one for {\em Computational Linguistics\/} or the one in the -{\em Publication Manual of the American -Psychological Association\/}~\cite{APA:83}. Use of full names for -authors rather than initials is preferred. A list of abbreviations -for common computer science journals can be found in the ACM -{\em Computing Reviews\/}~\cite{ACM:83}. - -The \LaTeX{} and Bib\TeX{} style files provided roughly fit the -American Psychological Association format, allowing regular citations, -short citations and multiple citations as described above. - -{\bf Appendices}: Appendices, if any, directly follow the text and the -references (but see above). Letter them in sequence and provide an -informative title: {\bf Appendix A. Title of Appendix}. - -\subsection{Footnotes} - -{\bf Footnotes}: Put footnotes at the bottom of the page and use 9 -points text. They may be numbered or referred to by asterisks or other -symbols.\footnote{This is how a footnote should appear.} Footnotes -should be separated from the text by a line.\footnote{Note the line -separating the footnotes from the text.} - -\subsection{Graphics} - -{\bf Illustrations}: Place figures, tables, and photographs in the -paper near where they are first discussed, rather than at the end, if -possible. Wide illustrations may run across both columns. Color -illustrations are discouraged, unless you have verified that -they will be understandable when printed in black ink. - -{\bf Captions}: Provide a caption for every illustration; number each one -sequentially in the form: ``Figure 1. Caption of the Figure.'' ``Table 1. -Caption of the Table.'' Type the captions of the figures and -tables below the body, using 11 point text. - - -\section{XML conversion and supported \LaTeX\ packages} - -ACL 2014 innovates over earlier years in that we will attempt to -automatically convert your \LaTeX\ source files to machine-readable -XML with semantic markup. This will facilitate future research that -uses the ACL proceedings themselves as a corpus. - -We encourage you to submit a ZIP file of your \LaTeX\ sources along -with the camera-ready version of your paper. We will then convert them -to XML automatically, using the LaTeXML tool -(\url{http://dlmf.nist.gov/LaTeXML}). LaTeXML has \emph{bindings} for -a number of \LaTeX\ packages, including the ACL 2014 stylefile. These -bindings allow LaTeXML to render the commands from these packages -correctly in XML. For best results, we encourage you to use the -packages that are officially supported by LaTeXML, listed at -\url{http://dlmf.nist.gov/LaTeXML/manual/included.bindings} - - - - - -\section{Translation of non-English Terms} - -It is also advised to supplement non-English characters and terms -with appropriate transliterations and/or translations -since not all readers understand all such characters and terms. -Inline transliteration or translation can be represented in -the order of: original-form transliteration ``translation''. - -\section{Length of Submission} -\label{sec:length} - -Long papers may consist of up to 8 pages of content, plus two extra -pages for references. Short papers may consist of up to 4 pages of -content, plus two extra pages for references. Papers that do not -conform to the specified length and formatting requirements may be -rejected without review. - - - -\section*{Acknowledgments} - -The acknowledgments should go immediately before the references. Do -not number the acknowledgments section. Do not include this section -when submitting your paper for review. - -% include your own bib file like this: -%\bibliographystyle{acl} -%\bibliography{acl2014} - -\begin{thebibliography}{} - -\bibitem[\protect\citename{Aho and Ullman}1972]{Aho:72} -Alfred~V. Aho and Jeffrey~D. Ullman. -\newblock 1972. -\newblock {\em The Theory of Parsing, Translation and Compiling}, volume~1. -\newblock Prentice-{Hall}, Englewood Cliffs, NJ. - -\bibitem[\protect\citename{{American Psychological Association}}1983]{APA:83} -{American Psychological Association}. -\newblock 1983. -\newblock {\em Publications Manual}. -\newblock American Psychological Association, Washington, DC. - -\bibitem[\protect\citename{{Association for Computing Machinery}}1983]{ACM:83} -{Association for Computing Machinery}. -\newblock 1983. -\newblock {\em Computing Reviews}, 24(11):503--512. - -\bibitem[\protect\citename{Chandra \bgroup et al.\egroup }1981]{Chandra:81} -Ashok~K. Chandra, Dexter~C. Kozen, and Larry~J. Stockmeyer. -\newblock 1981. -\newblock Alternation. -\newblock {\em Journal of the Association for Computing Machinery}, - 28(1):114--133. - -\bibitem[\protect\citename{Gusfield}1997]{Gusfield:97} -Dan Gusfield. -\newblock 1997. -\newblock {\em Algorithms on Strings, Trees and Sequences}. -\newblock Cambridge University Press, Cambridge, UK. - -\end{thebibliography} - -\end{document}