From 7a0dc8246f096ca9ac87ff8c6bc35c8e3a543a6a Mon Sep 17 00:00:00 2001
From: Avneesh Singh Saluja <avneesh@cs.cmu.edu>
Date: Fri, 23 May 2014 23:32:27 -0700
Subject: [PATCH] latest version of paper; sections 2 and 3 almost complete

---
 EMNLP2014/acl2014.tex        |  431 ---------
 EMNLP2014/bibliography.bib   | 1650 ++++++++++++++++++++++++++++++++++
 EMNLP2014/emnlp2014.tex      |  535 +++++++++++
 EMNLP2014/spectral_scfgs.tex |  431 ---------
 4 files changed, 2185 insertions(+), 862 deletions(-)
 delete mode 100644 EMNLP2014/acl2014.tex
 create mode 100644 EMNLP2014/bibliography.bib
 create mode 100644 EMNLP2014/emnlp2014.tex
 delete mode 100644 EMNLP2014/spectral_scfgs.tex

diff --git a/EMNLP2014/acl2014.tex b/EMNLP2014/acl2014.tex
deleted file mode 100644
index 5862361..0000000
--- a/EMNLP2014/acl2014.tex
+++ /dev/null
@@ -1,431 +0,0 @@
-%
-% File acl2014.tex
-%
-% Contact: koller@ling.uni-potsdam.de, yusuke@nii.ac.jp
-%%
-%% Based on the style files for ACL-2013, which were, in turn,
-%% Based on the style files for ACL-2012, which were, in turn,
-%% based on the style files for ACL-2011, which were, in turn, 
-%% based on the style files for ACL-2010, which were, in turn, 
-%% based on the style files for ACL-IJCNLP-2009, which were, in turn,
-%% based on the style files for EACL-2009 and IJCNLP-2008...
-
-%% Based on the style files for EACL 2006 by 
-%%e.agirre@ehu.es or Sergi.Balari@uab.es
-%% and that of ACL 08 by Joakim Nivre and Noah Smith
-
-\documentclass[11pt]{article}
-\usepackage{acl2014}
-\usepackage{times}
-\usepackage{url}
-\usepackage{latexsym}
-
-%\setlength\titlebox{5cm}
-
-% You can expand the titlebox if you need extra space
-% to show all the authors. Please do not make the titlebox
-% smaller than 5cm (the original size); we will check this
-% in the camera-ready version and ask you to change it back.
-
-
-\title{Instructions for ACL-2014 Proceedings}
-
-\author{First Author \\
-  Affiliation / Address line 1 \\
-  Affiliation / Address line 2 \\
-  Affiliation / Address line 3 \\
-  {\tt email@domain} \\\And
-  Second Author \\
-  Affiliation / Address line 1 \\
-  Affiliation / Address line 2 \\
-  Affiliation / Address line 3 \\
-  {\tt email@domain} \\}
-
-\date{}
-
-\begin{document}
-\maketitle
-\begin{abstract}
-  This document contains the instructions for preparing a camera-ready
-  manuscript for the proceedings of ACL-2014. The document itself
-  conforms to its own specifications, and is therefore an example of
-  what your manuscript should look like. These instructions should be
-  used for both papers submitted for review and for final versions of
-  accepted papers.  Authors are asked to conform to all the directions
-  reported in this document.
-\end{abstract}
-
-\section{Credits}
-
-This document has been adapted from the instructions for earlier ACL
-proceedings, including those for ACL-2012 by Maggie Li and Michael
-White, those from ACL-2010 by Jing-Shing Chang and Philipp Koehn,
-those for ACL-2008 by Johanna D. Moore, Simone Teufel, James Allan,
-and Sadaoki Furui, those for ACL-2005 by Hwee Tou Ng and Kemal
-Oflazer, those for ACL-2002 by Eugene Charniak and Dekang Lin, and
-earlier ACL and EACL formats. Those versions were written by several
-people, including John Chen, Henry S. Thompson and Donald
-Walker. Additional elements were taken from the formatting
-instructions of the {\em International Joint Conference on Artificial
-  Intelligence}.
-
-\section{Introduction}
-
-The following instructions are directed to authors of papers submitted
-to ACL-2014 or accepted for publication in its proceedings. All
-authors are required to adhere to these specifications. Authors are
-required to provide a Portable Document Format (PDF) version of their
-papers. \textbf{The proceedings are designed for printing on A4
-  paper.}
-
-Authors from countries in which access to word-processing systems is
-limited should contact the publication chairs, Alexander Koller
-(\texttt{koller@ling.uni-potsdam.de}) and Yusuke Miyao
-(\texttt{yusuke@nii.ac.jp}), as soon as possible.
-
-We will make more detailed instructions available at
-\url{http://sites.google.com/site/acl2014publication}. Please check
-this website regularly.
-
-
-\section{General Instructions}
-
-Manuscripts must be in two-column format.  Exceptions to the
-two-column format include the title, authors' names and complete
-addresses, which must be centered at the top of the first page, and
-any full-width figures or tables (see the guidelines in
-Subsection~\ref{ssec:first}). {\bf Type single-spaced.}  Start all
-pages directly under the top margin. See the guidelines later
-regarding formatting the first page.  The manuscript should be
-printed single-sided and its length
-should not exceed the maximum page limit described in Section~\ref{sec:length}.
-Do not number the pages.
-
-
-\subsection{Electronically-available resources}
-
-We strongly prefer that you prepare your PDF files using \LaTeX\ with
-the official ACL 2014 style file (acl2014.sty) and bibliography style
-(acl.bst). These files are available at
-\url{http://www.cs.jhu.edu/ACL2014/}. You will also find the document
-you are currently reading (acl2014.pdf) and its \LaTeX\ source code
-(acl2014.tex) on this website.
-
-You can alternatively use Microsoft Word to produce your PDF file. In
-this case, we strongly recommend the use of the Word template file
-(acl2014.dot) on the ACL 2014 website. If you have an option, we
-recommend that you use the \LaTeX2e version. If you will be
-  using the Microsoft Word template, we suggest that you anonymize
-  your source file so that the pdf produced does not retain your
-  identity.  This can be done by removing any personal information
-from your source document properties.
-
-
-
-\subsection{Format of Electronic Manuscript}
-\label{sect:pdf}
-
-For the production of the electronic manuscript you must use Adobe's
-Portable Document Format (PDF). PDF files are usually produced from
-\LaTeX\ using the \textit{pdflatex} command. If your version of
-\LaTeX\ produces Postscript files, you can convert these into PDF
-using \textit{ps2pdf} or \textit{dvipdf}. On Windows, you can also use
-Adobe Distiller to generate PDF.
-
-Please make sure that your PDF file includes all the necessary fonts
-(especially tree diagrams, symbols, and fonts with Asian
-characters). When you print or create the PDF file, there is usually
-an option in your printer setup to include none, all or just
-non-standard fonts.  Please make sure that you select the option of
-including ALL the fonts. \textbf{Before sending it, test your PDF by
-  printing it from a computer different from the one where it was
-  created.} Moreover, some word processors may generate very large PDF
-files, where each page is rendered as an image. Such images may
-reproduce poorly. In this case, try alternative ways to obtain the
-PDF. One way on some systems is to install a driver for a postscript
-printer, send your document to the printer specifying ``Output to a
-file'', then convert the file to PDF.
-
-It is of utmost importance to specify the \textbf{A4 format} (21 cm
-x 29.7 cm) when formatting the paper. When working with
-{\tt dvips}, for instance, one should specify {\tt -t a4}.
-
-Print-outs of the PDF file on A4 paper should be identical to the
-hardcopy version. If you cannot meet the above requirements about the
-production of your electronic submission, please contact the
-publication chairs as soon as possible.
-
-
-\subsection{Layout}
-\label{ssec:layout}
-
-Format manuscripts two columns to a page, in the manner these
-instructions are formatted. The exact dimensions for a page on A4
-paper are:
-
-\begin{itemize}
-\item Left and right margins: 2.5 cm
-\item Top margin: 2.5 cm
-\item Bottom margin: 2.5 cm
-\item Column width: 7.7 cm
-\item Column height: 24.7 cm
-\item Gap between columns: 0.6 cm
-\end{itemize}
-
-\noindent Papers should not be submitted on any other paper size.
- If you cannot meet the above requirements about the production of your electronic submission, please contact the publication chairs above as soon as possible.
-
-
-\subsection{Fonts}
-
-For reasons of uniformity, Adobe's {\bf Times Roman} font should be
-used. In \LaTeX2e{} this is accomplished by putting
-
-\begin{quote}
-\begin{verbatim}
-\usepackage{times}
-\usepackage{latexsym}
-\end{verbatim}
-\end{quote}
-in the preamble. If Times Roman is unavailable, use {\bf Computer
-  Modern Roman} (\LaTeX2e{}'s default).  Note that the latter is about
-  10\% less dense than Adobe's Times Roman font.
-
-
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{|l|rl|}
-\hline \bf Type of Text & \bf Font Size & \bf Style \\ \hline
-paper title & 15 pt & bold \\
-author names & 12 pt & bold \\
-author affiliation & 12 pt & \\
-the word ``Abstract'' & 12 pt & bold \\
-section titles & 12 pt & bold \\
-document text & 11 pt  &\\
-captions & 11 pt & \\
-abstract text & 10 pt & \\
-bibliography & 10 pt & \\
-footnotes & 9 pt & \\
-\hline
-\end{tabular}
-\end{center}
-\caption{\label{font-table} Font guide. }
-\end{table}
-
-\subsection{The First Page}
-\label{ssec:first}
-
-Center the title, author's name(s) and affiliation(s) across both
-columns. Do not use footnotes for affiliations. Do not include the
-paper ID number assigned during the submission process. Use the
-two-column format only when you begin the abstract.
-
-{\bf Title}: Place the title centered at the top of the first page, in
-a 15-point bold font. (For a complete guide to font sizes and styles,
-see Table~\ref{font-table}) Long titles should be typed on two lines
-without a blank line intervening. Approximately, put the title at 2.5
-cm from the top of the page, followed by a blank line, then the
-author's names(s), and the affiliation on the following line. Do not
-use only initials for given names (middle initials are allowed). Do
-not format surnames in all capitals (e.g., use ``Schlangen'' not
-``SCHLANGEN'').  Do not format title and section headings in all
-capitals as well except for proper names (such as ``BLEU'') that are
-conventionally in all capitals.  The affiliation should contain the
-author's complete address, and if possible, an electronic mail
-address. Start the body of the first page 7.5 cm from the top of the
-page.
-
-The title, author names and addresses should be completely identical
-to those entered to the electronical paper submission website in order
-to maintain the consistency of author information among all
-publications of the conference. If they are different, the publication
-chairs may resolve the difference without consulting with you; so it
-is in your own interest to double-check that the information is
-consistent.
-
-{\bf Abstract}: Type the abstract at the beginning of the first
-column. The width of the abstract text should be smaller than the
-width of the columns for the text in the body of the paper by about
-0.6 cm on each side. Center the word {\bf Abstract} in a 12 point bold
-font above the body of the abstract. The abstract should be a concise
-summary of the general thesis and conclusions of the paper. It should
-be no longer than 200 words. The abstract text should be in 10 point font.
-
-{\bf Text}: Begin typing the main body of the text immediately after
-the abstract, observing the two-column format as shown in 
-the present document. Do not include page numbers.
-
-{\bf Indent} when starting a new paragraph. Use 11 points for text and 
-subsection headings, 12 points for section headings and 15 points for
-the title. 
-
-\subsection{Sections}
-
-{\bf Headings}: Type and label section and subsection headings in the
-style shown on the present document.  Use numbered sections (Arabic
-numerals) in order to facilitate cross references. Number subsections
-with the section number and the subsection number separated by a dot,
-in Arabic numerals. Do not number subsubsections.
-
-{\bf Citations}: Citations within the text appear in parentheses
-as~\cite{Gusfield:97} or, if the author's name appears in the text
-itself, as Gusfield~\shortcite{Gusfield:97}.  Append lowercase letters
-to the year in cases of ambiguity.  Treat double authors as
-in~\cite{Aho:72}, but write as in~\cite{Chandra:81} when more than two
-authors are involved. Collapse multiple citations as
-in~\cite{Gusfield:97,Aho:72}. Also refrain from using full citations
-as sentence constituents. We suggest that instead of
-\begin{quote}
-  ``\cite{Gusfield:97} showed that ...''
-\end{quote}
-you use
-\begin{quote}
-``Gusfield \shortcite{Gusfield:97}   showed that ...''
-\end{quote}
-
-If you are using the provided \LaTeX{} and Bib\TeX{} style files, you
-can use the command \verb|\newcite| to get ``author (year)'' citations.
-
-As reviewing will be double-blind, the submitted version of the papers
-should not include the authors' names and affiliations. Furthermore,
-self-references that reveal the author's identity, e.g.,
-\begin{quote}
-``We previously showed \cite{Gusfield:97} ...''  
-\end{quote}
-should be avoided. Instead, use citations such as 
-\begin{quote}
-``Gusfield \shortcite{Gusfield:97}
-previously showed ... ''
-\end{quote}
-
-\textbf{Please do not use anonymous citations} and do not include
-acknowledgements when submitting your papers. Papers that do not
-conform to these requirements may be rejected without review.
-
-\textbf{References}: Gather the full set of references together under
-the heading {\bf References}; place the section before any Appendices,
-unless they contain references. Arrange the references alphabetically
-by first author, rather than by order of occurrence in the text.
-Provide as complete a citation as possible, using a consistent format,
-such as the one for {\em Computational Linguistics\/} or the one in the 
-{\em Publication Manual of the American 
-Psychological Association\/}~\cite{APA:83}.  Use of full names for
-authors rather than initials is preferred.  A list of abbreviations
-for common computer science journals can be found in the ACM 
-{\em Computing Reviews\/}~\cite{ACM:83}.
-
-The \LaTeX{} and Bib\TeX{} style files provided roughly fit the
-American Psychological Association format, allowing regular citations, 
-short citations and multiple citations as described above.
-
-{\bf Appendices}: Appendices, if any, directly follow the text and the
-references (but see above).  Letter them in sequence and provide an
-informative title: {\bf Appendix A. Title of Appendix}.
-
-\subsection{Footnotes}
-
-{\bf Footnotes}: Put footnotes at the bottom of the page and use 9
-points text. They may be numbered or referred to by asterisks or other
-symbols.\footnote{This is how a footnote should appear.} Footnotes
-should be separated from the text by a line.\footnote{Note the line
-separating the footnotes from the text.}
-
-\subsection{Graphics}
-
-{\bf Illustrations}: Place figures, tables, and photographs in the
-paper near where they are first discussed, rather than at the end, if
-possible.  Wide illustrations may run across both columns.  Color
-illustrations are discouraged, unless you have verified that  
-they will be understandable when printed in black ink.
-
-{\bf Captions}: Provide a caption for every illustration; number each one
-sequentially in the form:  ``Figure 1. Caption of the Figure.'' ``Table 1.
-Caption of the Table.''  Type the captions of the figures and 
-tables below the body, using 11 point text.
-
-
-\section{XML conversion and supported \LaTeX\ packages}
-
-ACL 2014 innovates over earlier years in that we will attempt to
-automatically convert your \LaTeX\ source files to machine-readable
-XML with semantic markup. This will facilitate future research that
-uses the ACL proceedings themselves as a corpus.
-
-We encourage you to submit a ZIP file of your \LaTeX\ sources along
-with the camera-ready version of your paper. We will then convert them
-to XML automatically, using the LaTeXML tool
-(\url{http://dlmf.nist.gov/LaTeXML}). LaTeXML has \emph{bindings} for
-a number of \LaTeX\ packages, including the ACL 2014 stylefile. These
-bindings allow LaTeXML to render the commands from these packages
-correctly in XML. For best results, we encourage you to use the
-packages that are officially supported by LaTeXML, listed at
-\url{http://dlmf.nist.gov/LaTeXML/manual/included.bindings}
-
-
-
-
-
-\section{Translation of non-English Terms}
-
-It is also advised to supplement non-English characters and terms
-with appropriate transliterations and/or translations
-since not all readers understand all such characters and terms.
-Inline transliteration or translation can be represented in
-the order of: original-form transliteration ``translation''.
-
-\section{Length of Submission}
-\label{sec:length}
-
-Long papers may consist of up to 8 pages of content, plus two extra
-pages for references. Short papers may consist of up to 4 pages of
-content, plus two extra pages for references.  Papers that do not
-conform to the specified length and formatting requirements may be
-rejected without review.
-
-
-
-\section*{Acknowledgments}
-
-The acknowledgments should go immediately before the references.  Do
-not number the acknowledgments section. Do not include this section
-when submitting your paper for review.
-
-% include your own bib file like this:
-%\bibliographystyle{acl}
-%\bibliography{acl2014}
-
-\begin{thebibliography}{}
-
-\bibitem[\protect\citename{Aho and Ullman}1972]{Aho:72}
-Alfred~V. Aho and Jeffrey~D. Ullman.
-\newblock 1972.
-\newblock {\em The Theory of Parsing, Translation and Compiling}, volume~1.
-\newblock Prentice-{Hall}, Englewood Cliffs, NJ.
-
-\bibitem[\protect\citename{{American Psychological Association}}1983]{APA:83}
-{American Psychological Association}.
-\newblock 1983.
-\newblock {\em Publications Manual}.
-\newblock American Psychological Association, Washington, DC.
-
-\bibitem[\protect\citename{{Association for Computing Machinery}}1983]{ACM:83}
-{Association for Computing Machinery}.
-\newblock 1983.
-\newblock {\em Computing Reviews}, 24(11):503--512.
-
-\bibitem[\protect\citename{Chandra \bgroup et al.\egroup }1981]{Chandra:81}
-Ashok~K. Chandra, Dexter~C. Kozen, and Larry~J. Stockmeyer.
-\newblock 1981.
-\newblock Alternation.
-\newblock {\em Journal of the Association for Computing Machinery},
-  28(1):114--133.
-
-\bibitem[\protect\citename{Gusfield}1997]{Gusfield:97}
-Dan Gusfield.
-\newblock 1997.
-\newblock {\em Algorithms on Strings, Trees and Sequences}.
-\newblock Cambridge University Press, Cambridge, UK.
-
-\end{thebibliography}
-
-\end{document}
diff --git a/EMNLP2014/bibliography.bib b/EMNLP2014/bibliography.bib
new file mode 100644
index 0000000..a7269cf
--- /dev/null
+++ b/EMNLP2014/bibliography.bib
@@ -0,0 +1,1650 @@
+%%%%%%%%%%%%%%%%%%%%%%
+%General "classic" papers in Stats and NLP
+%%%%%%%%%%%%%%%%%%%%%%
+@book{Hellinger1909,
+  title={Neue Begr{\"u}ndung der Theorie quadratischer Formen von unendlichvielen Ver{\"a}nderlichen},
+  author={Hellinger, E.},
+  year={1909},
+  publisher={Reimer}
+}
+
+@article{Fisher1925,
+author = {Fisher,R. A.},
+title = {{Theory of Statistical Estimation}},
+journal = {Mathematical Proceedings of the Cambridge Philosophical Society},
+volume = {22},
+issue = {05},
+issn = {1469-8064},
+pages = {700--725},
+numpages = {26},
+year = {1925}
+}
+
+@article{Dice1945,
+    author = {Dice, L. R.},
+    journal = {Ecology},
+    number = {3},
+    pages = {297--302},
+    title = {{Measures of the Amount of Ecologic Association Between Species}},
+    volume = {26},
+    year = {1945}
+}
+
+@article{Rao1945,
+author = {Rao, C. Radhakrishna},
+title = {{Information and the Accuracy Attainable in the Estimation of Statistical Parameters}},
+journal = {Bulletin of the Calcutta Mathematical Society},
+volume = {37},
+year = {1945},
+number = {3},
+pages = {81--91},
+}
+
+@incollection{Zipf1949,
+  address = {Cambridge, MA},
+  author = {Zipf, George},
+  publisher = { Addison-Wesley},
+  title = { Human Behaviour and the Principle of Least-Effort},
+  year = { 1949}
+}
+
+@ARTICLE{Dempster1977,
+    author = {A. P. Dempster and N. M. Laird and D. B. Rubin},
+    title = {{Maximum likelihood from incomplete data via the EM algorithm}},
+    journal = {Journal of the Royal Statistical Society, Series B},
+    year = {1977},
+    volume = {39},
+    number = {1},
+    pages = {1--38}
+}
+
+@inproceedings{Baker1979,
+    author = "Baker, J.K.",
+    title = "Trainable grammars for speech recognition",
+    year = "1979",
+    booktitle = "Speech communication papers presented at the 97th Meeting of the Acoustical Society",
+    pages = "547-550",
+    keywords = "NLP",
+}
+
+@book{Chentsov1982,
+  title={Statistical Decision Rules and Optimal Inference},
+  author={Chentsov, N.N.},
+  isbn={9780821813478},
+  lccn={81015039},
+  series={Translations of mathematical monographs},
+  year={1982},
+  publisher={American Mathematical Society}
+}
+
+@inproceedings{Hwang1992,
+ author = {Hwang, Mei-Yuh and Huang, Xuedong},
+ title = {Subphonetic modeling for speech recognition},
+ booktitle = {Proceedings of the workshop on Speech and Natural Language},
+ series = {HLT '91},
+ year = {1992},
+ isbn = {1-55860-272-0},
+ location = {Harriman, New York},
+ pages = {174--179},
+ numpages = {6},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@article{Marcus1993,
+ author = {Marcus, Mitchell P. and Marcinkiewicz, Mary Ann and Santorini, Beatrice},
+ title = {{Building a large annotated corpus of English: the penn treebank}},
+ journal = {Computational Linguistics},
+ issue_date = {June 1993},
+ volume = {19},
+ number = {2},
+ month = jun,
+ year = {1993},
+ issn = {0891-2017},
+ pages = {313--330},
+ numpages = {18},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+}
+
+@book{Golub1996,
+ author = {Golub, Gene H. and Van Loan, Charles F.},
+ title = {Matrix Computations (3rd Ed.)},
+ year = {1996},
+ isbn = {0-8018-5414-8},
+ publisher = {Johns Hopkins University Press},
+ address = {Baltimore, MD, USA},
+} 
+
+
+@inproceedings{Chappelier1998,
+  author = {Chappelier, Jean-Cédric and Rajman, Martin},
+  booktitle = {TAPD},
+  date = {2004-11-29},
+  pages = {133-137},
+  title = {A Generalized CYK Algorithm for Parsing Stochastic CFG.},
+  year = 1998
+} 
+
+@article{Chen1999,
+  author    = {Stanley F. Chen and
+               Joshua Goodman},
+  title     = {An empirical study of smoothing techniques for language
+               modeling},
+  journal   = {Computer Speech {\&} Language},
+  volume    = {13},
+  number    = {4},
+  year      = {1999},
+  pages     = {359-393},
+}
+
+@inproceedings{Klein2001,
+  author    = {Dan Klein and
+               Christopher D. Manning},
+  title     = {Parsing and Hypergraphs},
+  booktitle     = {Proceedings of the Seventh International Workshop on Parsing
+               Technologies (IWPT-2001), 17-19 October 2001, Beijing, China},
+  year      = {2001},
+}
+@proceedings{DBLP:conf/iwpt/2001,
+  booktitle = {IWPT},
+  publisher = {Tsinghua University Press},
+  year      = {2001},
+}
+
+@article{Johnson2002,
+ author = {Johnson, Mark},
+ title = {{Squibs and discussions: the DOP Estimation method is biased and inconsistent}},
+ journal = {Computational Linguistics},
+ issue_date = {March 2002},
+ volume = {28},
+ number = {1},
+ month = mar,
+ year = {2002},
+ issn = {0891-2017},
+ pages = {71--76},
+ numpages = {6},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@book{Amari2007,
+  title={Methods of Information Geometry},
+  author={Amari, S. and Nagaoka, H. and Harada, D.},
+  isbn={9780821843024},
+  lccn={00059362},
+  series={Translations of mathematical monographs},
+  year={2007},
+  publisher={American Mathematical Society}
+}
+
+%%%%%%%%%%%%%%%%%%
+%General MT papers
+%%%%%%%%%%%%%%%%%%
+@article{Brown1990,
+author = {Brown, Peter F and Cocke, John and Pietra, Stephen A Della and Pietra, Vincent J Della and Jelinek, Frederick and Lafferty, John. D and Mercer, Robert. L. and Roossin, Paul S.},
+journal = {Computational Linguistics},
+keywords = {Statistical Machine Translation},
+mendeley-tags = {Statistical Machine Translation},
+number = {2},
+pages = {256--264},
+publisher = {MIT Press},
+title = {{A Statistical Approach To Machine Translation}},
+volume = {16},
+year = {1990}
+}
+
+@article{Brown1993,
+ author = {Brown, Peter F. and Pietra, Vincent J. Della and Pietra, Stephen A. Della and Mercer, Robert L.},
+ title = {The mathematics of statistical machine translation: parameter estimation},
+ journal = {Computational Linguistics},
+ issue_date = {June 1993},
+ volume = {19},
+ number = {2},
+ month = jun,
+ year = {1993},
+ pages = {263--311},
+ numpages = {49},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@inproceedings{Vogel1996,
+ author = {Vogel, Stephan and Ney, Hermann and Tillmann, Christoph},
+ title = {{HMM-based word alignment in statistical translation}},
+ booktitle = {Proceedings of the 16th conference on Computational linguistics - Volume 2},
+ series = {COLING '96},
+ year = {1996},
+ location = {Copenhagen, Denmark},
+ pages = {836--841},
+ numpages = {6},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@article{Wu1997,
+ author = {Wu, Dekai},
+ title = {Stochastic inversion transduction grammars and bilingual parsing of parallel corpora},
+ journal = {Computational Linguistics},
+ issue_date = {September 1997},
+ volume = {23},
+ number = {3},
+ month = sep,
+ year = {1997},
+ issn = {0891-2017},
+ pages = {377--403},
+ numpages = {27},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@inproceedings{Marcu2002,
+ author = {Marcu, Daniel and Wong, William},
+ title = {A phrase-based, joint probability model for statistical machine translation},
+ booktitle = {Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10},
+ series = {EMNLP '02},
+ year = {2002},
+ pages = {133--139},
+ numpages = {7},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Papineni2002,
+ author = {Papineni, Kishore and Roukos, Salim and Ward, Todd and Zhu, Wei-Jing},
+ title = {{BLEU: a method for automatic evaluation of machine translation}},
+ booktitle = {Proceedings of the 40th Annual Meeting on Association for Computational Linguistics},
+ series = {ACL '02},
+ year = {2002},
+ location = {Philadelphia, Pennsylvania},
+ pages = {311--318},
+ numpages = {8},
+ acmid = {1073135},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Koehn2003,
+ author = {Koehn, Philipp and Och, Franz Josef and Marcu, Daniel},
+ title = {Statistical phrase-based translation},
+ booktitle = {Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1},
+ series = {NAACL '03},
+ year = {2003},
+ location = {Edmonton, Canada},
+ pages = {48--54},
+ numpages = {7},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Och2003,
+author = {Och, Franz Josef},
+booktitle = {Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics},
+keywords = {Statistical Machine Translation},
+mendeley-tags = {Statistical Machine Translation},
+month = {July},
+pages = {160--167},
+title = {{Minimum Error Rate Training in Statistical Machine Translation}},
+year = {2003}
+}
+
+@article{Och2004,
+ author = {Och, Franz Josef and Ney, Hermann},
+ title = {{The Alignment Template Approach to Statistical Machine Translation}},
+ journal = {Computational Linguistics},
+ issue_date = {December 2004},
+ volume = {30},
+ number = {4},
+ month = dec,
+ year = {2004},
+ pages = {417--449},
+ numpages = {33},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@inproceedings{Shen2004,
+  author    = {Shen, Libin  and  Sarkar, Anoop  and  Och, Franz Josef},
+  title     = {Discriminative Reranking for Machine Translation},
+  booktitle = {HLT-NAACL 2004: Main Proceedings },
+  editor = {Susan Dumais, Daniel Marcu and Salim Roukos},
+  year      = 2004,
+  month     = {May 2 - May 7},
+  address   = {Boston, Massachusetts, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {177--184}
+}
+
+@inproceedings{Galley2004,
+  author    = {Galley, Michel  and  Hopkins, Mark  and  Knight, Kevin  and  Marcu, Daniel},
+  title     = {What's in a translation rule?},
+  booktitle = {HLT-NAACL 2004: Main Proceedings },
+  editor    = {Susan Dumais, Daniel Marcu and Salim Roukos},
+  year      = 2004,
+  month     = {May 2 - May 7},
+  address   = {Boston, Massachusetts, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {273--280},
+}
+
+@inproceedings{Chiang2005,
+ author = {Chiang, David},
+ title = {A hierarchical phrase-based model for statistical machine translation},
+ booktitle = {Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics},
+ series = {ACL '05},
+ year = {2005},
+ location = {Ann Arbor, Michigan},
+ pages = {263--270},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@inproceedings{Matsuzaki2005,
+ author = {Matsuzaki, Takuya and Miyao, Yusuke and Tsujii, Jun'ichi},
+ title = {Probabilistic CFG with Latent Annotations},
+ booktitle = {Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics},
+ series = {ACL '05},
+ year = {2005},
+ location = {Ann Arbor, Michigan},
+ pages = {75--82},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Petrov2006,
+ author = {Petrov, Slav and Barrett, Leon and Thibaux, Romain and Klein, Dan},
+ title = {Learning accurate, compact, and interpretable tree annotation},
+ booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics},
+ series = {ACL-44},
+ year = {2006},
+ location = {Sydney, Australia},
+ pages = {433--440},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Galley2006,
+ author = {Galley, Michel and Graehl, Jonathan and Knight, Kevin and Marcu, Daniel and DeNeefe, Steve and Wang, Wei and Thayer, Ignacio},
+ title = {Scalable inference and training of context-rich syntactic translation models},
+ booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics},
+ series = {ACL-44},
+ year = {2006},
+ location = {Sydney, Australia},
+ pages = {961--968},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Zollmann2006,
+ author = {Zollmann, Andreas and Venugopal, Ashish},
+ title = {Syntax augmented machine translation via chart parsing},
+ booktitle = {Proceedings of the Workshop on Statistical Machine Translation},
+ series = {StatMT '06},
+ year = {2006},
+ location = {New York City, New York},
+ pages = {138--141},
+ numpages = {4},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Huang2006,
+  author = {Liang Huang and Kevin Knight and Aravind Joshi},
+  title = {Statistical Syntax-Directed Translation with Extended Domain of Locality},
+  booktitle = {Proceedings of AMTA},
+  month = {August},
+  year = {2006}
+}
+
+@InProceedings{McClosky2006,
+  author    = {McClosky, David  and  Charniak, Eugene  and  Johnson, Mark},
+  title     = {Effective Self-Training for Parsing},
+  booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Main Conference},
+  month     = {June},
+  year      = {2006},
+  address   = {New York City, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {152--159},
+}
+
+@article{Kumar2006,
+  author    = {Shankar Kumar and
+               Yonggang Deng and
+               William Byrne},
+  title     = {A weighted finite state transducer translation template
+               model for statistical machine translation},
+  journal   = {Natural Language Engineering},
+  volume    = {12},
+  number    = {1},
+  year      = {2006},
+  pages     = {35-75},
+}
+
+@article{Chiang2007,
+ author = {Chiang, David},
+ title = {Hierarchical Phrase-Based Translation},
+ journal = {Computational Linguistics},
+ issue_date = {June 2007},
+ volume = {33},
+ number = {2},
+ month = jun,
+ year = {2007},
+ pages = {201--228},
+ numpages = {28},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+}
+
+@inproceedings{Zettlemoyer2007,
+ author = {Zettlemoyer, Luke S. and Moore, Robert C.},
+ title = {Selective phrase pair extraction for improved statistical machine translation},
+ booktitle = {Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers},
+ series = {NAACL-Short '07},
+ year = {2007},
+ location = {Rochester, New York},
+ pages = {209--212},
+ numpages = {4},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@article{Graehl2008,
+ author = {Graehl, Jonathan and Knight, Kevin and May, Jonathan},
+ title = {Training Tree Transducers},
+ journal = {Comput. Linguist.},
+ issue_date = {September 2008},
+ volume = {34},
+ number = {3},
+ month = sep,
+ year = {2008},
+ issn = {0891-2017},
+ pages = {391--427},
+ numpages = {37},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@INPROCEEDINGS{Paul2009,
+    author = {Michael Paul},
+    title = "Overview of the IWSLT 2009 Evaluation Campaign",
+    booktitle = {Proceedings of IWSLT 2009},
+    location = {Tokyo, Japan},
+    year = {2009},
+}
+
+@inproceedings{Kumar2009,
+ author = {Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz},
+ title = {Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices},
+ booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1},
+ series = {ACL '09},
+ year = {2009},
+ location = {Suntec, Singapore},
+ pages = {163--171},
+ numpages = {9},
+} 
+
+@inproceedings{Dyer2010,
+  author={Chris Dyer and Adam Lopez
+         and Juri Ganitkevitch and Johnathan Weese and Ferhan Ture
+         and Phil Blunsom and Hendra Setiawan and Vladimir Eidelman and Philip Resnik},
+  title={cdec: A Decoder, Alignment, and Learning framework for finite-state and context-free translation models},
+  booktitle = {Proceedings of ACL},
+  year={2010},
+}
+
+@InProceedings{Bojar2013,
+  author    = {Bojar, Ond\v{r}ej  and  Buck, Christian  and  Callison-Burch, Chris  and  Federmann, Christian  and  Haddow, Barry  and  Koehn, Philipp  and  Monz, Christof  and  Post, Matt  and  Soricut, Radu  and  Specia, Lucia},
+  title     = {Findings of the 2013 {Workshop on Statistical Machine Translation}},
+  booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
+  month     = {August},
+  year      = {2013},
+  address   = {Sofia, Bulgaria},
+  publisher = {Association for Computational Linguistics},
+  pages     = {1--44},
+}
+
+%%%%%%%%%%%%%%%%%%%%%%
+"Monolingual" MT papers
+%%%%%%%%%%%%%%%%%%%%%
+
+@InProceedings{CallisonBurch2006,
+  author    = {Callison-Burch, Chris  and  Koehn, Philipp  and  Osborne, Miles},
+  title     = {Improved Statistical Machine Translation Using Paraphrases},
+  booktitle = {Proceedings of the Human Language Technology Conference of the NAACL, Main Conference},
+  month     = {June},
+  year      = {2006},
+  address   = {New York City, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {17--24},
+}
+
+@InProceedings{Haghighi2008,
+  author    = {Haghighi, Aria  and  Liang, Percy  and  Berg-Kirkpatrick, Taylor  and  Klein, Dan},
+  title     = {Learning Bilingual Lexicons from Monolingual Corpora},
+  booktitle = {Proceedings of ACL-08: HLT},
+  month     = {June},
+  year      = {2008},
+  address   = {Columbus, Ohio},
+  publisher = {Association for Computational Linguistics},
+  pages     = {771--779},
+}
+
+@InProceedings{Ravi2011,
+  author    = {Ravi, Sujith  and  Knight, Kevin},
+  title     = {Deciphering Foreign Language},
+  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
+  month     = {June},
+  year      = {2011},
+  address   = {Portland, Oregon, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {12--21},
+} 
+
+%%%%%%%%%%%%%%%%%%%%%%
+%non heuristic phrase extraction papers
+%%%%%%%%%%%%%%%%%%%%%%
+
+@inproceedings{DeNero2006,
+ author = {DeNero, John and Gillick, Dan and Zhang, James and Klein, Dan},
+ title = {Why generative phrase models underperform surface heuristics},
+ booktitle = {Proceedings of the Workshop on Statistical Machine Translation},
+ series = {StatMT '06},
+ year = {2006},
+ location = {New York City, New York},
+ pages = {31--38},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@inproceedings{DeNero2008,
+ author = {DeNero, John and Bouchard-C\^{o}t{\'e}, Alexandre and Klein, Dan},
+ title = {{Sampling alignment structure under a Bayesian translation model}},
+ booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
+ series = {EMNLP '08},
+ year = {2008},
+ location = {Honolulu, Hawaii},
+ pages = {314--323},
+ numpages = {10},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}   
+
+@inproceedings{Blunsom2008,
+ author = {Blunsom, Phil and Cohn, Trevor and Osborne, Miles},
+ title = {{Bayesian Synchronous Grammar Induction}},
+ booktitle = {Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems},
+ series = {NIPS 2008},
+ year = {2008},
+ location = {Vancouver, British Columbia},
+}
+
+@inproceedings{Zhang2008,
+ author = {Zhang, Hao and Gildea, Daniel and Chiang, David},
+ title = {Extracting Synchronous Grammar Rules from Word-level Alignments in Linear Time},
+ booktitle = {Proceedings of the 22Nd International Conference on Computational Linguistics - Volume 1},
+ series = {COLING '08},
+ year = {2008},
+ location = {Manchester, United Kingdom},
+ pages = {1081--1088},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+}    
+
+@inproceedings{Blunsom2009,
+ author = {Blunsom, Phil and Cohn, Trevor and Dyer, Chris and Osborne, Miles},
+ title = {A Gibbs sampler for phrasal synchronous grammar induction},
+ booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2},
+ series = {ACL '09},
+ year = {2009},
+ location = {Suntec, Singapore},
+ pages = {782--790},
+ numpages = {9},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Cohn2009,
+ author = {Cohn, Trevor and Blunsom, Phil},
+ title = {{A Bayesian model of syntax-directed tree to string grammar induction}},
+ booktitle = {Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1},
+ series = {EMNLP '09},
+ year = {2009},
+ isbn = {978-1-932432-59-6},
+ location = {Singapore},
+ pages = {352--361},
+ numpages = {10},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@inproceedings{Levenberg2012,
+ author = {Levenberg, Abby and Dyer, Chris and Blunsom, Phil},
+ title = {{A Bayesian model for learning SCFGs with discontiguous rules}},
+ booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
+ series = {EMNLP-CoNLL '12},
+ year = {2012},
+ location = {Jeju Island, Korea},
+ pages = {223--232},
+ numpages = {10},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@InProceedings{Mylonakis2008,
+author = {Markos Mylonakis and Khalil Sima'an},
+title = {{Phrase Translation Probabilities with {ITG} Priors and Smoothing as Learning Objective}},
+booktitle = {Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing},
+pages = {630--639},
+month = {October},
+year = {2008},
+address = {Honolulu, USA},
+publisher = {Association for Computational Linguistics}
+}
+
+@InProceedings{Mylonakis2010,
+author = {Markos Mylonakis and Khalil Sima'an},
+title = {{Learning Probabilistic Synchronous {CFGs} for Phrase-based Translation}},
+booktitle = {Fourteenth Conference on Computational Natural Language Learning},
+pages = {117--125},
+month = {July},
+year = {2010},
+address = {Uppsala, Sweden},
+publisher = {Association for Computational Linguistics}
+}
+
+@InProceedings{Mylonakis2011,
+  author    = {Mylonakis, Markos  and  Sima'an, Khalil},
+  title     = {{Learning Hierarchical Translation Structure with Linguistic Annotations}},
+  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
+  month     = {June},
+  year      = {2011},
+  address   = {Portland, Oregon, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {642--652},
+}
+
+@inproceedings{Huang2010,
+ author = {Huang, Zhongqiang and \v{C}mejrek, Martin and Zhou, Bowen},
+ title = {{Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions}},
+ booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing},
+ series = {EMNLP '10},
+ year = {2010},
+ location = {Cambridge, Massachusetts},
+ pages = {138--147},
+ numpages = {10},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+%%%%%%%%%%%%%%%%%%%
+%Morpho-MT
+%%%%%%%%%%%%%%%%%%%
+@InProceedings{Toutanova2008,
+  author    = {Toutanova, Kristina  and  Suzuki, Hisami  and  Ruopp, Achim},
+  title     = {Applying Morphology Generation Models to Machine Translation},
+  booktitle = {Proceedings of ACL-08: HLT},
+  month     = {June},
+  year      = {2008},
+  address   = {Columbus, Ohio},
+  publisher = {Association for Computational Linguistics},
+  pages     = {514--522},
+}
+
+@inproceedings{Chahuneau2013,
+  author    = {Victor Chahuneau and Eva Schlinger and Noah A. Smith and Chris Dyer},
+  title     = {Translating into Morphologically Rich Languages with Synthetic Phrases},
+  booktitle = {Proc. of EMNLP},
+  year      = {2013}
+}
+
+@InProceedings{Tsvetkov2013,
+  author    = {Tsvetkov, Yulia  and  Dyer, Chris  and  Levin, Lori  and  Bhatia, Archna},
+  title     = {Generating {English} Determiners in Phrase-Based Translation with Synthetic Translation Options},
+  booktitle = {Proceedings of the Eighth Workshop on Statistical Machine Translation},
+  month     = {August},
+  year      = {2013},
+  address   = {Sofia, Bulgaria},
+  publisher = {Association for Computational Linguistics},
+  pages     = {271--280},
+}
+
+%%%%%%%%%%%%%%%%%%%
+%Discriminative Training in MT
+%%%%%%%%%%%%%%%%%%%
+
+@inproceedings{Liang2006,
+ author = {Liang, Percy and Bouchard-C\^{o}t{\'e}, Alexandre and Klein, Dan and Taskar, Ben},
+ title = {An end-to-end discriminative approach to machine translation},
+ booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics},
+ series = {ACL-44},
+ year = {2006},
+ location = {Sydney, Australia},
+ pages = {761--768},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}   
+
+@INPROCEEDINGS{Watanabe2007,
+    author = {Taro Watanabe and Jun Suzuki and Hajime Tsukada and Hideki Isozaki},
+    title = {Online large-margin training for statistical machine translation},
+    booktitle = {In Proc. of EMNLP},
+    year = {2007}
+}
+
+@inproceedings{Chiang2009,
+ author = {Chiang, David and Knight, Kevin and Wang, Wei},
+ title = {11,001 new features for statistical machine translation},
+ booktitle = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+ series = {NAACL '09},
+ year = {2009},
+ isbn = {978-1-932432-41-1},
+ location = {Boulder, Colorado},
+ pages = {218--226},
+ numpages = {9},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@article{Chiang2012,
+ author = {Chiang, David},
+ title = {{Hope and Fear for Discriminative Training of Statistical Translation Models}},
+ journal = {J. Mach. Learn. Res.},
+ year = {2012},
+ issn = {1532-4435},
+ pages = {1159--1187},
+ numpages = {29},
+ publisher = {JMLR.org},
+} 
+
+@InProceedings{Saluja2012,
+  author    = {Saluja, Avneesh and Lane, Ian and Zhang, Ying},
+  title     = {Machine Translation with Binary Feedback: a Large-Margin Approach},
+  booktitle = {The Tenth Biennial Conference of the Association for Machine Translation in the Americas},
+  month     = {October},
+  year      = {2012},
+  address   = {San Diego, California},
+}
+
+@InProceedings{Flanigan2013,
+  author    = {Flanigan, Jeffrey  and  Dyer, Chris  and  Carbonell, Jaime},
+  title     = {Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search},
+  booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
+  month     = {June},
+  year      = {2013},
+  address   = {Atlanta, Georgia},
+  publisher = {Association for Computational Linguistics},
+  pages     = {248--258},
+}
+
+
+%%%%%%%%%%%%%%%%%%%%
+%Mining for parallel/comparable corpora
+%%%%%%%%%%%%%%%%%%%
+
+@article{Resnik2003,
+ author = {Resnik, Philip and Smith, Noah A.},
+ title = {The Web as a parallel corpus},
+ journal = {Computational Linguistics},
+ issue_date = {September 2003},
+ volume = {29},
+ number = {3},
+ month = sep,
+ year = {2003},
+ issn = {0891-2017},
+ pages = {349--380},
+ numpages = {32},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+}
+
+@inproceedings{Zhang2005, 
+   author = {Ying Zhang and Fei Huang and Stephan Vogel}, 
+   title = {Mining translations of OOV terms from the web through cross-lingual query expansion}, 
+   booktitle = {SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval}, 
+   year = {2005}, 
+   pages = {669--670}, 
+   location = {Salvador, Brazil}, 
+   publisher = {ACM Press}, 
+   address = {New York, NY, USA}, 
+}
+
+@inproceedings{Snover2008,
+ author = {Snover, Matthew and Dorr, Bonnie and Schwartz, Richard},
+ title = {Language and translation model adaptation using comparable corpora},
+ booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing},
+ series = {EMNLP '08},
+ year = {2008},
+ location = {Honolulu, Hawaii},
+ pages = {857--866},
+ numpages = {10},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+%%%%%%%%%%%%%%%%%%%
+%Bilingual Lexicon Induction
+%%%%%%%%%%%%%%%%%%%
+
+@inproceedings{Rapp1995,
+  author    = {Rapp, Reinhard},
+  title = {Identifying Word Translations in Non-Parallel Texts},
+  booktitle     = {Proceedings of the 33rd Annual Meeting of the Association for Computational
+               Linguistics},
+  series = {ACL '95},
+  location = {Cambridge, MA}, 
+  year      = {1995},
+}
+
+@inproceedings{Fung1998,
+ author = {Fung, Pascale and Yee, Lo Yuen},
+ title = {An IR approach for translating new words from nonparallel, comparable texts},
+ booktitle = {Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1},
+ series = {ACL '98},
+ year = {1998},
+ location = {Montreal, Quebec, Canada},
+ pages = {414--420},
+ numpages = {7},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@inproceedings{Rapp1999,
+ author = {Rapp, Reinhard},
+ title = {Automatic identification of word translations from unrelated English and German corpora},
+ booktitle = {Proceedings of the 37th annual meeting of the Association for Computational Linguistics},
+ series = {ACL '99},
+ year = {1999},
+ location = {College Park, Maryland},
+ pages = {519--526},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@INPROCEEDINGS{Koehn2002,
+    author = {Philipp Koehn and Kevin Knight},
+    title = {Learning a Translation Lexicon from Monolingual Corpora},
+    booktitle = {In Proceedings of ACL Workshop on Unsupervised Lexical Acquisition},
+    year = {2002},
+    pages = {9--16}
+}
+
+@inproceedings{Tamura2012,
+  author    = {Akihiro Tamura and
+               Taro Watanabe and
+               Eiichiro Sumita},
+  title     = {Bilingual Lexicon Extraction from Comparable Corpora Using
+               Label Propagation},
+  booktitle = {Proceedings of the 2012 Joint Conference on Empirical Methods
+               in Natural Language Processing and Computational Natural
+               Language Learning},
+ series = {EMNLP-CoNLL '12},
+  year      = {2012},
+  pages     = {24-36},
+}
+
+@InProceedings{Irvine2013a,
+  author    = {Irvine, Ann  and  Callison-Burch, Chris},
+  title     = {Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals},
+  booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
+  month     = {June},
+  year      = {2013},
+  address   = {Atlanta, Georgia},
+  publisher = {Association for Computational Linguistics},
+  pages     = {518--523},
+}
+
+@inProceedings{Irvine2013b,
+author = {Irvine, Ann and Callison-Burch, Chris}, 
+title = {Combining Bilingual and Comparable Corpora for Low Resource Machine Translation}, 
+booktitle = {Proceedings of the ACL Workshop on Statistical Machine Translation (WMT)}, 
+year = {2013},
+}
+
+%%%%%%%%%%%%%%%%%%%%
+%General Machine Learning
+%%%%%%%%%%%%%%%%%%%%
+@article{Camastra2003,
+  author    = {Francesco Camastra},
+  title     = {Data dimensionality estimation methods: a survey},
+  journal   = {Pattern Recognition},
+  volume    = {36},
+  number    = {12},
+  year      = {2003},
+  pages     = {2945-2954},
+}
+
+@article{Hardoon2004,
+ author = {Hardoon, David R. and Szedmak, Sandor R. and Shawe-taylor, John R.},
+ title = {Canonical Correlation Analysis: An Overview with Application to Learning Methods},
+ journal = {Neural Comput.},
+ issue_date = {December 2004},
+ volume = {16},
+ number = {12},
+ month = dec,
+ year = {2004},
+ issn = {0899-7667},
+ pages = {2639--2664},
+ numpages = {26},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@phdthesis{Lebanon2005,
+ author = {Lebanon, Guy},
+ title = {Riemannian geometry and statistical machine learning},
+ year = {2005},
+ isbn = {0-496-93472-4},
+ note = {AAI3159986},
+ publisher = {Carnegie Mellon University},
+ school = {Carnegie Mellon University},
+ address = {Pittsburgh, PA, USA},
+} 
+
+@book{Bishop2006,
+ author = {Bishop, Christopher M.},
+ title = {{Pattern Recognition and Machine Learning (Information Science and Statistics)}},
+ year = {2006},
+ isbn = {0387310738},
+ publisher = {Springer-Verlag New York, Inc.},
+ address = {Secaucus, NJ, USA},
+} 
+
+@inproceedings{Andrew2007,
+ author = {Andrew, Galen and Gao, Jianfeng},
+ title = {Scalable training of L1-regularized log-linear models},
+ booktitle = {Proceedings of the 24th international conference on Machine learning},
+ series = {ICML '07},
+ year = {2007},
+ isbn = {978-1-59593-793-3},
+ location = {Corvalis, Oregon},
+ pages = {33--40},
+ numpages = {8},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+} 
+
+@inproceedings{Duchi2008,
+ author = {Duchi, John and Shalev-Shwartz, Shai and Singer, Yoram and Chandra, Tushar},
+ title = {Efficient projections onto the l1-ball for learning in high dimensions},
+ booktitle = {Proceedings of the 25th international conference on Machine learning},
+ series = {ICML '08},
+ year = {2008},
+ isbn = {978-1-60558-205-4},
+ location = {Helsinki, Finland},
+ pages = {272--279},
+ numpages = {8},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+} 
+
+@article{Ganchev2010,
+ author = {Ganchev, Kuzman and Gra\c{c}a, Jo\~{a}o and Gillenwater, Jennifer and Taskar, Ben},
+ title = {{Posterior Regularization for Structured Latent Variable Models}},
+ journal = {J. Mach. Learn. Res.},
+ volume = {99},
+ month = {August},
+ year = {2010},
+ issn = {1532-4435},
+ pages = {2001--2049},
+ numpages = {49},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+}
+
+@article{Dekel2010,
+  author    = {Ofer Dekel and
+               Ohad Shamir},
+  title     = {Multiclass-Multilabel Classification with More Classes than
+               Examples},
+  journal   = {Journal of Machine Learning Research - Proceedings Track},
+  volume    = {9},
+  year      = {2010},
+  pages     = {137-144},
+}
+
+@inproceedings{Berg-Kirkpatrick2010,
+ author = {Berg-Kirkpatrick, Taylor and Bouchard-C\^{o}t{\'e}, Alexandre and DeNero, John and Klein, Dan},
+ title = {Painless Unsupervised Learning with Features},
+ booktitle = {Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+ series = {HLT '10},
+ year = {2010},
+ isbn = {1-932432-65-5},
+ location = {Los Angeles, California},
+ pages = {582--590},
+ numpages = {9},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+%%%%%%%%%%%%%%%%%%%%%%%
+%Sparsity papers
+%%%%%%%%%%%%%%%%%%%%%%%
+@article{Natarajan1995,
+ author = {Natarajan, B. K.},
+ title = {Sparse Approximate Solutions to Linear Systems},
+ journal = {SIAM J. Comput.},
+ issue_date = {April 1995},
+ volume = {24},
+ number = {2},
+ month = apr,
+ year = {1995},
+ issn = {0097-5397},
+ pages = {227--234},
+ numpages = {8},
+ publisher = {Society for Industrial and Applied Mathematics},
+ address = {Philadelphia, PA, USA},
+ keywords = {linear systems, sparse solutions},
+}
+
+@ARTICLE{Tibshirani1996,
+    author = {Robert Tibshirani},
+    title = {Regression Shrinkage and Selection Via the Lasso},
+    journal = {Journal of the Royal Statistical Society, Series B},
+    year = {1996},
+    volume = {58},
+    pages = {267--288}
+}
+
+@article{Chen2001,
+ author = {Chen, Scott Shaobing and Donoho, David L. and Saunders, Michael A.},
+ title = {Atomic Decomposition by Basis Pursuit},
+ journal = {SIAM Rev.},
+ issue_date = {2001},
+ volume = {43},
+ number = {1},
+ month = jan,
+ year = {2001},
+ issn = {0036-1445},
+ pages = {129--159},
+ numpages = {31},
+ publisher = {Society for Industrial and Applied Mathematics},
+ address = {Philadelphia, PA, USA},
+ keywords = {\$\ell^1\$ norm optimization, MATLAB code, cosine packets, denoising, interior-point methods for linear programming, matching pursuit, multiscale edges, overcomplete signal representation, time-frequency analysis, time-scale analysis, total variation denoising, wavelet packets, wavelets},
+}  
+
+@article{Candes2005,
+ author = {Candes, E. J. and Tao, T.},
+ title = {Decoding by linear programming},
+ journal = {IEEE Trans. Inf. Theor.},
+ issue_date = {December 2005},
+ volume = {51},
+ number = {12},
+ month = dec,
+ year = {2005},
+ issn = {0018-9448},
+ pages = {4203--4215},
+ numpages = {13},
+ acmid = {2271950},
+ publisher = {IEEE Press},
+ address = {Piscataway, NJ, USA},
+} 
+
+@inproceedings{Garg2009,
+ author = {Garg, Rahul and Khandekar, Rohit},
+ title = {Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property},
+ booktitle = {Proceedings of the 26th Annual International Conference on Machine Learning},
+ series = {ICML '09},
+ year = {2009},
+ isbn = {978-1-60558-516-1},
+ location = {Montreal, Quebec, Canada},
+ pages = {337--344},
+ numpages = {8},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+} 
+
+@inproceedings{Pilanci2012,
+  Author = {Mert Pilanci and Laurent {El Ghaoui} and Venkat Chandrasekaran},
+  Title = {Recovery of Sparse Probability Measures via Convex Programming},
+  Booktitle= {Proc. Advances in Neural Information Processing Systems ({NIPS})},
+  Year = {2012},
+  Month = Dec
+}
+
+@inproceedings{Kyrillidis2013, 
+    Publisher = {JMLR Workshop and Conference Proceedings}, 
+    Title = {Sparse projections onto the simplex}, 
+    Booktitle = {Proceedings of the 30th International Conference on Machine Learning (ICML-13)}, 
+    Author = {Anastasios Kyrillidis and Stephen Becker and Volkan Cevher and Christoph Koch}, 
+    Month = may, 
+    Volume = {28}, 
+    Editor = {Sanjoy Dasgupta and David Mcallester}, 
+    Year = {2013}, 
+    Pages = {235-243}, 
+   }
+
+%%%%%%%%%%%%%%%%%%%%%%
+%Spectral Learning papers
+%%%%%%%%%%%%%%%%%%%%%
+@article{Jaeger2000,
+ author = {Jaeger, Herbert},
+ title = {{Observable Operator Models for Discrete Stochastic Time Series}},
+ journal = {Neural Comput.},
+ issue_date = {June 2000},
+ volume = {12},
+ number = {6},
+ month = jun,
+ year = {2000},
+ issn = {0899-7667},
+ pages = {1371--1398},
+ publisher = {MIT Press},
+ address = {Cambridge, MA, USA},
+} 
+
+@inproceedings{Hsu2009,
+  author    = {Daniel Hsu and
+               Sham M. Kakade and
+               Tong Zhang},
+  title     = {{A Spectral Algorithm for Learning Hidden Markov Models}},
+  booktitle = {COLT},
+  year      = {2009},
+}
+
+@inproceedings{Boots2011,
+  Author = "Byron Boots and Sajid Siddiqi and Geoffrey Gordon ",
+  Booktitle = "Proceedings of the 25th National Conference on Artificial Intelligence (AAAI-2011)",
+  Title = "An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems ",
+  Year = "2011"
+}
+
+@inproceedings{Balle2011,
+ author = {Balle, Borja and Quattoni, Ariadna and Carreras, Xavier},
+ title = {A spectral learning algorithm for finite state transducers},
+ booktitle = {Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I},
+ series = {ECML PKDD'11},
+ year = {2011},
+ location = {Athens, Greece},
+ pages = {156--171},
+ numpages = {16},
+ publisher = {Springer-Verlag},
+ address = {Berlin, Heidelberg},
+}
+
+@inproceedings{Parikh2011,
+  author    = {Ankur P. Parikh and
+               Le Song and
+               Eric P. Xing},
+  title     = {{A Spectral Algorithm for Latent Tree Graphical Models}},
+  booktitle = {Proceedings of the 28th International Conference on Machine
+               Learning (ICML)},
+  year      = {2011},
+  pages     = {1065-1072},
+} 
+
+@inproceedings{Anandkumar2011,
+ title ={{Spectral Methods for Learning Multivariate Latent Tree Structure}},
+ author={Animashree Anandkumar and Kamalika Chaudhuri and Daniel J. Hsu and Sham M. Kakade and Le Song and Tong Zhang},
+ booktitle = {Advances in Neural Information Processing Systems 24},
+ editor = {J. Shawe-Taylor and R.S. Zemel and P. Bartlett and F.C.N. Pereira and K.Q. Weinberger},
+ pages = {2025--2033},
+ year = {2011}
+}
+
+@inproceedings{Dhillon2011,
+  title = {{Multi-View Learning of Word Embeddings via CCA}},
+  author = {Paramveer S. Dhillon and Dean Foster and Lyle Ungar},
+  booktitle = {Advances in Neural Information Processing Systems (NIPS)},
+  volume={24},
+  year = {2011}
+}
+
+@inproceedings{Dhillon2012,
+ author = {Paramveer S. Dhillon and Jordan Rodu and Michael Collins and Dean P. Foster and Lyle H. Ungar},
+ title = {{Spectral Dependency Parsing with Latent Variables}},
+ booktitle = {Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning},
+ series = {EMNLP-CoNLL'12},
+ year = {2012},
+ location = {Jeju, Korea}
+ } 
+
+@inproceedings{Anandkumar2012,
+title ={{A Spectral Algorithm for Latent Dirichlet Allocation}},
+author={Animashree Anandkumar and Dean Foster and Daniel Hsu and Sham Kakade and Yi-Kai Liu},
+booktitle = {Advances in Neural Information Processing Systems 25},
+editor = {P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger},
+pages = {926--934},
+year = {2012},
+}
+
+@inproceedings{Cohen2012a,
+    author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar",
+    title = "Spectral Learning of Latent-Variable {PCFGs}",
+    booktitle = "Proceedings of ACL",
+    year = "2012"
+}
+
+@inproceedings{Cohen2012b,
+    author = "S. B. Cohen and M. Collins",
+    title = "Tensor Decomposition for Fast Latent-Variable {PCFG} Parsing",
+    booktitle = "Proceedings of NIPS",
+    year = "2012"
+}
+
+@InProceedings{Balle2012,
+  author =    {Borja Balle and Ariadna Quattoni and Xavier Carreras},
+  title =     {{Local Loss Optimization in Operator Models: A New Insight into Spectral Learning}},
+  booktitle = {Proceedings of the 29th International Conference on Machine Learning (ICML-12)},
+  series =    {ICML '12},
+  year =      {2012},
+  editor =    {John Langford and Joelle Pineau},
+  location =  {Edinburgh, Scotland, GB},
+  month =     {July},
+  publisher = {Omnipress},
+  address =   {New York, NY, USA},
+  pages=      {1879--1886},
+}
+
+@incollection{Hsu2012,
+title ={{Identifiability and Unmixing of Latent Parse Trees}},
+author={Daniel Hsu and Sham Kakade and Percy Liang},
+booktitle = {Advances in Neural Information Processing Systems 25},
+editor = {P. Bartlett and F.C.N. Pereira and C.J.C. Burges and L. Bottou and K.Q. Weinberger},
+pages = {1520--1528},
+year = {2012},
+}  
+
+@inproceedings{Cohen2013,
+    author = "S. B. Cohen and K. Stratos and M. Collins and D. P. Foster and L. Ungar",
+    title = "Experiments with Spectral Learning of Latent-Variable {PCFGs}",
+    booktitle = "Proceedings of {NAACL}",
+    year = "2013"
+}
+
+%%%%%%%%%%%%%%%%%%%%
+%Graph-based SSL
+%%%%%%%%%%%%%%%%%%%%
+@INPROCEEDINGS{Szummer2001,
+    author = {Martin Szummer and Tommi Jaakkola},
+    title = {Partially labeled classification with Markov random walks},
+    booktitle = {Advances in Neural Information Processing Systems},
+    year = {2001},
+    pages = {945--952},
+    publisher = {MIT Press}
+}
+
+@TECHREPORT{Zhu2002,
+    author = {Xiaojin Zhu and Zoubin Ghahramani},
+    title = {Learning from Labeled and Unlabeled Data with Label Propagation},
+    institution = {Carnegie Mellon University},
+    year = {2002}
+} 
+
+@inproceedings{Zhu2003,
+  author    = {Xiaojin Zhu and
+               Zoubin Ghahramani and
+               John D. Lafferty},
+  title     = {Semi-Supervised Learning Using Gaussian Fields and Harmonic
+               Functions},
+  booktitle = {Proceedings of the Twentieth International Conference on Machine Learning}, 
+  series = {ICML '03},
+  year      = {2003},
+  pages     = {912-919},
+} 
+
+@incollection {Zhou2004,
+  author = " Dengyong Zhou and  Olivier Bousquet and  Thomas Navin Lal and  Jason Weston and  Bernhard {Sch\"{o}lkopf}",
+  title = " Learning with Local and Global Consistency",
+  booktitle = "Advances in Neural Information Processing Systems 16",
+  editor = "Sebastian Thrun and Lawrence Saul and Bernhard {Sch\"{o}lkopf}",
+  publisher = "MIT Press",
+  address = "Cambridge, MA",
+  year = "2004",
+}
+
+@phdthesis{Zhu2005,
+ author = {Zhu, Xiaojin},
+ title = {Semi-supervised learning with graphs},
+ year = {2005},
+ isbn = {0-542-19059-1},
+ note = {AAI3179046},
+ publisher = {Carnegie Mellon University},
+ school = {Carnegie Mellon University},
+ address = {Pittsburgh, PA, USA},
+} 
+
+@article{Belkin2006,
+author = {Mikhail Belkin and Partha Niyogi and Vikas Sindhwani},
+title = {Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.},
+journal = {Journal of Machine Learning Research},
+volume = {7},
+year = {2006},
+pages = {2399-2434},
+}
+
+@INCOLLECTION{Bengio2006,
+     author = {Bengio, Yoshua and Delalleau, Olivier and Le Roux, Nicolas},
+     editor = {Chapelle, Olivier and {Sch{\"{o}}lkopf}, Bernhard and Zien, Alexander},
+      title = {Label Propagation and Quadratic Criterion},
+  booktitle = {Semi-Supervised Learning},
+       year = {2006},
+      pages = {193--216},
+  publisher = {{MIT} Press},
+}
+
+@article{Yan2007,
+ author = {Yan, Shuicheng and Xu, Dong and Zhang, Benyu and Zhang, Hong-Jiang and Yang, Qiang and Lin, Stephen},
+ title = {Graph Embedding and Extensions: A General Framework for Dimensionality Reduction},
+ journal = {IEEE Trans. Pattern Anal. Mach. Intell.},
+ issue_date = {January 2007},
+ volume = {29},
+ number = {1},
+ month = jan,
+ year = {2007},
+ issn = {0162-8828},
+ pages = {40--51},
+ numpages = {12},
+ publisher = {IEEE Computer Society},
+ address = {Washington, DC, USA},
+} 
+
+
+@inproceedings{Talukdar2009,
+ author = {Talukdar, Partha Pratim and Crammer, Koby},
+ title = {New Regularized Algorithms for Transductive Learning},
+ booktitle = {Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II},
+ series = {ECML PKDD '09},
+ year = {2009},
+ isbn = {978-3-642-04173-0},
+ location = {Bled, Slovenia},
+ pages = {442--457},
+ numpages = {16},
+}
+
+@incollection{Subramanya2009,
+ title = {Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification},
+ author = {Amarnag Subramanya and Jeff Bilmes},
+ booktitle = {Advances in Neural Information Processing Systems 22},
+ editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C. K. I. Williams and A. Culotta},
+ pages = {1803--1811},
+ year = {2009}
+} 
+
+@InProceedings{Dhillon2010,
+  author    = {Paramveer S. Dhillon  and  Partha Pratim Talukdar and Koby Crammer},
+  title     = {Learning Better Data Representation using Inference-Driven Metric Learning (IDML)},
+  booktitle = {Proceedings of the ACL 2010 Conference},
+  month     = {July },
+  year      = {2010},
+  address   = {Uppsala, Sweden},
+  publisher = {Association for Computational Linguistics}
+}
+
+@article{Subramanya2011,
+ author = {Subramanya, Amarnag and Bilmes, Jeff},
+ title = {Semi-Supervised Learning with Measure Propagation},
+ journal = {J. Mach. Learn. Res.},
+ issue_date = {2/1/2011},
+ volume = {12},
+ month = nov,
+ year = {2011},
+ issn = {1532-4435},
+ pages = {3311--3370},
+ numpages = {60},
+ publisher = {JMLR.org},
+} 
+
+%%%%%%%%%%%%%%%%%%%%
+%Graph-based SSL & NLP
+%%%%%%%%%%%%%%%%%%%%
+@inproceedings{Rao2008,
+ author = {Rao, Delip and Yarowsky, David and Callison-Burch, Chris},
+ title = {Affinity Measures Based on the Graph Laplacian},
+ booktitle = {Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing},
+ series = {TextGraphs-3},
+ year = {2008},
+ location = {Manchester, United Kingdom},
+ pages = {41--48},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@InProceedings{Alexandrescu2009,
+  author    = {Alexandrescu, Andrei  and  Kirchhoff, Katrin},
+  title     = {Graph-based Learning for Statistical Machine Translation},
+  booktitle = {Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
+  series = {NAACL-HLT '09},
+  month     = {June},
+  year      = {2009},
+  location   = {Boulder, Colorado},
+  publisher = {Association for Computational Linguistics},
+  pages     = {119--127},
+}
+
+@inproceedings{Subramanya2010,
+ author = {Subramanya, Amarnag and Petrov, Slav and Pereira, Fernando},
+ title = {Efficient graph-based semi-supervised learning of structured tagging models},
+ booktitle = {Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing},
+ series = {EMNLP '10},
+ year = {2010},
+ location = {Cambridge, Massachusetts},
+ pages = {167--176},
+ numpages = {10},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@InProceedings{Das2011,
+  author    = {Das, Dipanjan  and  Petrov, Slav},
+  title     = {Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections},
+  booktitle = {Proc. of ACL},
+  year      = {2011}
+} 
+
+@inproceedings{Liu2012,
+ author = {Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming},
+ title = {Learning translation consensus with structured label propagation},
+ booktitle = {Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1},
+ series = {ACL '12},
+ year = {2012},
+ location = {Jeju Island, Korea},
+ pages = {302--310},
+ numpages = {9},
+ acmid = {2390567},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@InProceedings{Klementiev2012,
+  author    = {Klementiev, Alexandre  and  Irvine, Ann  and  Callison-Burch, Chris  and  Yarowsky, David},
+  title     = {Toward Statistical Machine Translation without Parallel Corpora},
+  booktitle = {Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics},
+  month     = {April},
+  year      = {2012},
+  address   = {Avignon, France},
+  publisher = {Association for Computational Linguistics},
+  pages     = {130--140},
+}
+
+@inproceedings{Das2012,
+Author = {Das, Dipanjan and Smith, Noah A.},
+Booktitle = {Proc. of NAACL-HLT},
+Title = {Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties},
+Year = {2012}}
+
+@inproceedings{Razmara2013,
+ author = {Razmara, Majid and Siahbani, Maryam and Haffari, Gholamreza and Sarkar, Anoop},
+ title = {Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation},
+ booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics},
+ series = {ACL-51},
+ year = {2013},
+ location = {Sofia, Bulgaria},
+ numpages = {8},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+}
+
+@inproceedings{Saluja2014,
+Author = {Avneesh Saluja and Kristina Toutanova and Chris Quirk and Hany Hassan},
+Title = {Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data},
+Year = {2014},
+booktitle = {Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics},
+series = {ACL-52},
+ location = {Baltimore, MD},
+ numpages = {9},
+ publisher = {Association for Computational Linguistics},
+}
+
+
+%%%%%%%%%%%%%%%%%%%%
+%Distributed Representations and Neural Networks
+%%%%%%%%%%%%%%%%%%%
+
+@inproceedings{Vincent2008,
+ author = {Vincent, Pascal and Larochelle, Hugo and Bengio, Yoshua and Manzagol, Pierre-Antoine},
+ title = {Extracting and Composing Robust Features with Denoising Autoencoders},
+ booktitle = {Proceedings of the 25th International Conference on Machine Learning},
+ series = {ICML '08},
+ year = {2008},
+ isbn = {978-1-60558-205-4},
+ location = {Helsinki, Finland},
+ pages = {1096--1103},
+ numpages = {8},
+ publisher = {ACM},
+ address = {New York, NY, USA},
+} 
+
+
+@inproceedings{Turian2010,
+ author = {Turian, Joseph and Ratinov, Lev and Bengio, Yoshua},
+ title = {Word Representations: A Simple and General Method for Semi-supervised Learning},
+ booktitle = {Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics},
+ series = {ACL '10},
+ year = {2010},
+ location = {Uppsala, Sweden},
+ pages = {384--394},
+ numpages = {11},
+ acmid = {1858721},
+ publisher = {Association for Computational Linguistics},
+ address = {Stroudsburg, PA, USA},
+} 
+
+@INPROCEEDINGS{Mikolov2010,
+   author = {Tomáš Mikolov and Martin Karafiát and Lukáš Burget and Jan
+	Černocký and Sanjeev Khudanpur},
+   title = {Recurrent neural network based language model},
+   pages = {1045--1048},
+   booktitle = {Proceedings of the 11th Annual Conference of the
+	International Speech Communication Association (INTERSPEECH
+	2010)},
+   journal = {Proceedings of Interspeech},
+   volume = {2010},
+   number = {9},
+   year = {2010},
+   publisher = {International Speech Communication Association},
+}
+
+@article{Turney2010,
+ author = {Turney, Peter D. and Pantel, Patrick},
+ title = {From Frequency to Meaning: Vector Space Models of Semantics},
+ journal = {J. Artif. Int. Res.},
+ issue_date = {January 2010},
+ volume = {37},
+ number = {1},
+ month = jan,
+ year = {2010},
+ issn = {1076-9757},
+ pages = {141--188},
+ numpages = {48},
+ publisher = {AI Access Foundation},
+ address = {USA},
+} 
+
+@phdthesis{Mikolov2012,
+ author = {Mikolov, Tomas},
+ title = {Statistical Language Models based on Neural Networks},
+ year = {2012},
+ publisher = {Brno University of Technology},
+ school = {Brno University of Technology},
+ address = {Pittsburgh, PA, USA},
+}
+
+@inproceedings{Huang2012,
+author = {Eric H. Huang and Richard Socher and Christopher D. Manning and Andrew Y. Ng},
+title = {{Improving Word Representations via Global Context and Multiple Word Prototypes}},
+booktitle = {Annual Meeting of the Association for Computational Linguistics (ACL)},
+year = 2012
+}
+
+@InProceedings{Mikolov2013a,
+  author    = {Mikolov, Tomas  and  Yih, Wen-tau  and  Zweig, Geoffrey},
+  title     = {Linguistic Regularities in Continuous Space Word Representations},
+  booktitle = {Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
+  month     = {June},
+  year      = {2013},
+  address   = {Atlanta, Georgia},
+  publisher = {Association for Computational Linguistics},
+  pages     = {746--751},
+}
+
+@misc{Mikolov2013b,
+Author = {Tomas Mikolov and Ilya Sutskever and Kai Chen and Greg Corrado and Jeffrey Dean},
+Title = {Distributed Representations of Words and Phrases and their Compositionality},
+Year = {2013},
+Eprint = {arXiv:1310.4546},
+}
+
+
+%%%%%%%%%%%%%%%%%%%%
+%Compositional Semantics
+%%%%%%%%%%%%%%%%%%%%
+@phdthesis{Sahlgren2006,
+ author = {Sahlgren, M.},
+ title = {The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces},
+ year = {2006},
+ publisher = {Stockholm University},
+ school = {Department of Linguistics, Stockholm University},
+} 
+
+@article{Mitchell2010, 
+  author = {Jeff Mitchell and Mirella Lapata}, 
+  title = {Composition in Distributional Models of Semantics}, 
+  journal = {Cognitive Science}, 
+  year = {2010}, 
+  volume = {34}, 
+  number = {8}, 
+  pages = {1388--1439} 
+ }
+ 
+ @inproceedings{Socher2012,
+ author = {Richard Socher and Brody Huval and Christopher D. Manning and Andrew Y. Ng},
+ title = {{Semantic Compositionality Through Recursive Matrix-Vector Spaces}},
+ booktitle = {Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, 
+ year = 2012
+ }
+ 
+
+@InProceedings{Tsubaki2013,
+  author    = {Tsubaki, Masashi  and  Duh, Kevin  and  Shimbo, Masashi  and  Matsumoto, Yuji},
+  title     = {Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks},
+  booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing},
+  month     = {October},
+  year      = {2013},
+  address   = {Seattle, Washington, USA},
+  publisher = {Association for Computational Linguistics},
+  pages     = {130--140},
+  url       = {http://www.aclweb.org/anthology/D13-1014}
+}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/EMNLP2014/emnlp2014.tex b/EMNLP2014/emnlp2014.tex
new file mode 100644
index 0000000..8c7472d
--- /dev/null
+++ b/EMNLP2014/emnlp2014.tex
@@ -0,0 +1,535 @@
+%
+% File acl2014.tex
+%
+% Contact: koller@ling.uni-potsdam.de, yusuke@nii.ac.jp
+%%
+%% Based on the style files for ACL-2013, which were, in turn,
+%% Based on the style files for ACL-2012, which were, in turn,
+%% based on the style files for ACL-2011, which were, in turn, 
+%% based on the style files for ACL-2010, which were, in turn, 
+%% based on the style files for ACL-IJCNLP-2009, which were, in turn,
+%% based on the style files for EACL-2009 and IJCNLP-2008...
+
+%% Based on the style files for EACL 2006 by 
+%%e.agirre@ehu.es or Sergi.Balari@uab.es
+%% and that of ACL 08 by Joakim Nivre and Noah Smith
+
+\documentclass[11pt]{article}
+\usepackage{acl2014}
+\usepackage{times}
+\usepackage{url}
+\usepackage{multirow}	
+\usepackage{latexsym}
+\usepackage{amsmath}
+\usepackage{amssymb}
+\usepackage{algorithm}
+\usepackage{graphicx}
+\usepackage[font=small,labelfont=bf]{caption}
+\usepackage{subcaption}
+\usepackage{enumitem}
+\usepackage{bm}
+\usepackage{multirow}
+
+\DeclareMathOperator*{\argmax}{arg\,max}
+\newcommand{\ts}{\textsuperscript}
+\newcommand{\rione}{r^{(i)}}
+\newcommand{\ritwo}{r^{(i,2)}}
+\newcommand{\rithree}{r^{(i,3)}}
+\newcommand{\xione}{t^{(i,1)}}
+\newcommand{\xitwo}{t^{(i,2)}}
+\newcommand{\xithree}{t^{(i,3)}}
+\newcommand{\aione}{a_i}
+\newcommand{\aitwo}{a^{(i,2)}}
+\newcommand{\aithree}{a^{(i,3)}}
+\newcommand{\yione}{y^{(i,1)}}
+\newcommand{\yitwo}{y^{(i,2)}}
+\newcommand{\yithree}{y^{(i,3)}}
+\newcommand{\phii}{\phi^{(i)}}
+\newcommand{\bi}{z^{(i)}}
+\newcommand{\oi}{o^{(i)}}
+\newcommand{\p}{{\cal P}}
+\newcommand{\internal}{{\cal I}}
+\newcommand{\n}{{\cal N}}
+\newcommand{\rules}{{\cal R}}
+\newcommand{\srule}{{X \rightarrow b, c}}
+\newcommand{\pa}{\mathrm{pa}}
+\newcommand{\lc}{\mathrm{lc}}
+\newcommand{\rc}{\mathrm{rc}}
+\newcommand{\diag}{\mathrm{diag}}
+\newcommand{\tleft}{\beta}
+\newcommand{\tright}{\gamma}
+\newcommand{\tree}{\tau}
+\newcommand{\e}[1]{\hat{#1}}
+\newcommand{\commentout}[1]{}
+\newcommand{\shorten}[1]{}
+\newcommand{\tcommentout}[1]{#1}
+\newcommand{\bS}{{\bf S}}
+\newcommand{\bX}{{\bf X}}
+\newfont{\msym}{msbm10}
+\newcommand{\reals}{\mbox{\msym R}}
+\newcommand{\qed}{{\setlength{\fboxsep}{0pt}
+\framebox[7pt]{\rule{0pt}{7pt}}}}
+\newcommand{\balpha}{\bm{\alpha}}
+\newcommand{\bbeta}{\bm{\beta}}
+
+% You can expand the titlebox if you need extra space
+% to show all the authors. Please do not make the titlebox
+% smaller than 5cm (the original size); we will check this
+% in the camera-ready version and ask you to change it back.
+%\setlength\titlebox{5cm} %for expanding the title box
+\title{Latent Synchronous CFGs for Hierarchical Phrase-based Translation}
+
+%\author{First Author \\
+%  Affiliation / Address line 1 \\
+%  Affiliation / Address line 2 \\
+%  Affiliation / Address line 3 \\
+%  {\tt email@domain} \\\And
+%  Second Author \\
+%  Affiliation / Address line 1 \\
+%  Affiliation / Address line 2 \\
+%  Affiliation / Address line 3 \\
+%  {\tt email@domain} \\}
+
+\date{}
+
+\begin{document}
+\maketitle
+\begin{abstract}
+  Abstract goes here.  
+\end{abstract}
+
+\section{Introduction}
+Introduction goes here. 
+%Statistical approaches to machine translation (MT) have achieved state-of-the-art results in many typologically diverse language pairs \cite{Bojar2013} by learning translation rules over longer multiword units or phrases (e.g., French $\rightarrow$ English: \emph{un chien Andalou} $\rightarrow$ \emph{an Andalusian dog}), instead of lexical or word units (\emph{chien} $\rightarrow$ \emph{dog}).  
+%Unfortunately, phrase-based translation contains its own set of issues. 
+%A prominent one is the significant increase in model size due to phrasal units, which makes parameter estimation during training a challenge and significantly slows slows down decoding during test time. 
+%The phrasal extraction heuristics that extract phrase pairs consistent with word-level alignments are often to blame, since there is a tendency to extract longer length phrasal translation units that are mainly applicable in restricted settings, e.g., phrase pairs like the German-English pair `\emph{der Amerikanische Pr{\"a}sident $\rightarrow$ convention allows the American president}'.  
+%However, it has been found that such translation units actually perform better than their minimal counterparts \cite{Galley2006}, primarily because they are more in-line with the kinds of independence assumptions we make with context-free grammar formalisms: with larger rules, right-hand side productions can be generated in a relatively context-independent manner.  
+
+%In this work, we propose to model additional context via a latent variable model that is featurized over inside and outside sub-trees of a synchronous grammar.  
+%Using a low-rank representation of the feature cross-product space (informally, the space that intuitively captures interactions of feature functions defined over inside and outside sub-trees), we can associate an additional set of parameters for each rule, representing the distribution over latent states. 
+%Unlike the expectation maximization (EM) algorithm, an iterative procedure based on maximum likelihood estimation that often gets stuck in local optima, our approach utilizes a spectrally-motivated moments-based method to estimate parameters of the latent variable model, which offers a more scalable way to estimate the millions of parameters in our model.   
+%During decoding, these states are marginalized yielding a context-dependent likelihood for each rule, which can then be incorporated as an additional feature in the standard MT pipeline.  
+
+\section{Latent Variable Models for Refinement}
+The core idea behind our proposed approach is an implicit refinement of translation rules in a synchronous context-free grammar (SCFG), using a latent variable model.  
+We first introduce the latent SCFG formalism and discuss how we acquire training examples of synchronous parse trees from word alignments, followed by a summary of the decoding algorithm for marginalizing over latent states, as it provides a natural way to introduce the data structures and representations used for the latent parameters.  
+The decoder is based on simple tensor-vector products that sum over the latent states.  
+Two methods to estimate the parameters will be discussed in \S\ref{sec:estimation}. 
+
+\subsection{Latent SCFGs}
+\label{sec:formalism}
+We extend the definition of L-PCFGs \cite{Matsuzaki2005,Petrov2006} to synchronous grammars as used in machine translation \cite{Galley2004,Chiang2005}. 
+In this work, the aim is to refine the one-category grammar introduced by \newcite{Chiang2005} for hierarchical phrase-based translation (HPBT) in an effort to incorporate additional translational context via refined non-terminal (NT) categories instead of longer translation rules.  
+Thus, the following discussion is restricted to these kinds of grammars, although the method is equally applicable in other scenarios, e.g., the extended tree-to-string transducer ({\bf xRs}) formalism \cite{Huang2006,Graehl2008} commonly used in syntax-directed translation.  
+An important point to keep in mind in comparison to L-PCFGs is that the right-hand side (RHS) non-terminals of synchronous rules are aligned pairs across the source and target languages.  
+
+A latent SCFG (L-SCFG) is a 6-tuple $(\mathcal{N}, m, n_s, n_t, \pi, t)$ where: 
+\begin{itemize}
+  \item $\mathcal{N}$ is a set of NT symbols in the grammar.  
+  	In our case, the set consists of only two symbols, \bX~and the goal symbol \bS.  
+  \item $[m]$ is the set of possible hidden states associated with NTs.  
+  	Aligned pairs of NTs across the source and target languages share the same hidden state.
+	In line with previous work, we assume that the states associated with NTs on the RHS are \emph{not} conditionally independent given the latent state of the left-hand side (LHS).  
+   \item $[n]_s$ is the set of source side words, i.e., the source-side terminal vocabulary.  
+   \item $[n]_t$ is the set of target side words, i.e., the target-side vocabulary.  
+   \item For $a =\bX, b \in [n]_s \cup \mathcal{N} \setminus \{\bS\}, c \in [n]_t
+     \cup \mathcal{N} \setminus \{\bS\}, h_1, h_2, h_3 \in [m]$, we have the following context-free rules, based on the number of NT symbols \bX~in the RHS of the rule:
+     \begin{itemize}
+       \item Two NTs: \\
+	   $a(h_1) \rightarrow ~<b(h_2, h_3), c(h_2, h_3), \sim>$, where $\sim$ is a one-to-one correspondence between the NT symbols of $b$ and $c$, $h_2$ is associated with one of the aligned NT pairs, and $h_3$ is associated with the other pair.  
+	   The rule has an associated parameter $t(a \rightarrow b,c, h_2, h_3 | a, h_1)$
+       \item One NT: \\
+	   	$a(h_1) \rightarrow ~<b(h_2), c(h_2), \sim>$, with associated parameter $t(a \rightarrow b, c, h_2 | a, h_1)$
+       \item No NTs:
+	   	$a(h_1) \rightarrow ~<b, c,\sim>$, with associated parameter $t(a \rightarrow b,c | a, h_1)$
+        \end{itemize}
+         \item For $a=\bS$, $h \in [m]$, $\pi(\bS, h)$ is a parameter specifying the probability of $\bS(h)$ being at the root of the tree. 
+\end{itemize}
+A skeletal tree (s-tree) for a sentence is a sequence of rules $r_1, \dots, r_N$ where each $r_i$ is of the form of one of the context-free rules above. 
+A full tree consists of an s-tree $r_1, \dots, r_N$ together with values $h_1, \dots, h_N$.  
+In HPBT, where only rules with at most two NTs in the RHS are used, the set of rules obtained from the training corpus $\rules$ can be further divided into three non-overlapping sets $\rules_0, \rules_1, \rules_2 \in \rules$, containing the pre-terminal, unary, and binary rules respectively.   
+
+\subsection{Minimal Grammar Extraction}
+\label{sec:mingrammar}
+In order to learn the parameters $t$, we need a set of synchronous s-trees, which can be acquired from word alignments.  
+%For each rule $r_i$ in each s-tree, we can either compute partial counts in the expectation step of the EM algorithm, or extract second-order moments of features on which we compute an SVD.  
+During the extraction phase, if we consider {\bf composed} rules, namely rules that can be formed out of smaller rules in the grammar, then there are multiple synchronous trees consistent with the alignments for a given sentence pair, and thus the total number of applicable rules can be combinatorially larger than if we just consider the set of {\bf minimal} rules i.e., rules that cannot be formed from other rules. 
+
+To extract a set of minimal rules for each word-aligned sentence pair, we utilize the linear-time extraction algorithm of \newcite{Zhang2008}.  
+Since the algorithm extracts one minimal tree for each sentence pair, derivation forests do not have to be considered, making parameter estimation more tractable.\footnote{For our \textsc{DE-EN} corpus (\S\ref{sec:data}), a grammar extracted using the traditional heuristics was more than 80 times larger than the minimal grammar.} 
+Furthermore, by using minimal rules as a starting point instead of the traditional heuristically-extracted rules \cite{Chiang2005} or arbitrary compositions of minimal rules \cite{Galley2006}, we are also able to explore the transition from minimal rules to composed ones in a principled manner by encoding contextual information through the latent states.   
+Thus, a beneficial side effect of our refinement process is the creation of more context-specific rules without increasing the overall size of the grammar.
+
+
+\subsection{Decoding}
+\label{sec:decoding} 
+\begin{figure}[h!]
+	\begin{footnotesize}
+	\framebox{\parbox{\columnwidth}{
+		{\bf Inputs:} Sentence $f_1 \ldots f_N$, L-SCFG $(\n, S, m, n)$, parameters $C^r \in \reals^{(m \times m \times m)}$, $\in \reals^{(m \times m)}$, or $\in \reals^{(1 \times m)}$ for all $r \in \rules$, $C^\bS \in \reals^{(m \times 1)}$, hypergraph $\mathcal{H}$.  
+
+		{\bf Data structures:} 
+		
+		For each node $q \in \mathcal{H}$:
+		\begin{itemize}[noitemsep]
+			\item $\balpha(q) \in \reals^{1 \times m}$ is a row vector of inside terms.
+			\item $\bbeta(q) \in \reals^{m \times 1}$ is a column vector of outside terms.
+			\item For each incoming edge $e \in {\bf B}(q)$ to node $q$, $\mu(e)$ is a marginal probability for edge (rule) $e$.			
+		\end{itemize}
+
+		{\bf Algorithm:}
+		
+		(Inside Computation)
+		%(Inside base case) $\forall i \in [N], \;\; \alpha^{X, i, i} = \sum_{r \in \bX \rightarrow f_i} C^r$
+					
+		For nodes $q$ in topological order in $\mathcal{H}$,
+			\begin{itemize}[label={},nolistsep]
+				\item $\balpha(q) = \bm{0}$
+				\item For each incoming edge $e \in {\bf B}(q)$,
+				\item \begin{itemize}[label={}]
+						\item tail = {\bf t}(e), rule = {\bf r}(e)
+						\item if $|$tail$| = 0$, then $\balpha(q) = \balpha(q) + C^{\textrm{rule}}$	
+						\item else if $|$tail$| = 1$, then $\balpha(q) = \balpha(q) + C^{\textrm{rule}} \times_1 \balpha(\textrm{tail}_0)$
+						\item else if $|$tail$| = 2$, then $\balpha(q) = \balpha(q) + C^{\textrm{rule}}  \times_2 \balpha(\textrm{tail}_1) \times_1 \balpha(\textrm{tail}_0)$	
+					\end{itemize}
+				\end{itemize}
+				
+				
+		(Outside Computation)
+				
+		For $q \in \mathcal{H}$,
+		\begin{itemize}[label={},nolistsep]
+			\item $\bbeta(q) = \bm{0}$
+		\end{itemize}
+		$\bbeta(\textrm{goal}) = C^\bS$
+		
+		For $q$ in reverse topological order in $\mathcal{H}$,
+		\begin{itemize}[label={},nolistsep]
+			\item For each incoming edge $e \in {\bf B}(q)$,
+			\item \begin{itemize}[label={}]
+				\item tail = {\bf t}(e), rule = {\bf r}(e)
+				\item if $|$tail$| = 1$, then $\bbeta(\textrm{tail}_0) = \bbeta(q) \times_0 C^{\textrm{rule}}$
+				\item else if $|$tail$| = 2$, then,
+					\begin{itemize}[label={}]
+						\item $\bbeta(\textrm{tail}_0) = \bbeta(q) \times_0 C^{\textrm{rule}} \times_2 \balpha(\textrm{tail}_1)$
+						\item $\bbeta(\textrm{tail}_1) = \bbeta(q) \times_0 C^{\textrm{rule}} \times_1 \balpha(\textrm{tail}_0)$						
+					\end{itemize}
+
+			\end{itemize}
+		\end{itemize}
+			
+
+		\hbox{(Marginals)}
+		Sentence probability $g = \balpha(\textrm{goal}) \times \bbeta(\textrm{goal})$
+		For edge $e \in \mathcal{H}$, 
+			\begin{itemize}[label={},nolistsep]
+					\item head = {\bf h}(e), tail = {\bf t}(e), rule = {\bf r}(e)
+					\item if $\mu(e) = (\bbeta(\textrm{head}) \times_0 C^{\textrm{rule}}) / g$
+					\item else if $|$tail$| = 2$, then $\mu(e) = (\bbeta(\textrm{head}) \times_0 C^{\textrm{rule}} \times_2 \balpha(\textrm{tail}_1) \times_1 \balpha(\textrm{tail}_0) / g$
+					\item else if $|$tail$| = 1$, then $\mu(e) = (\bbeta(\textrm{head}) \times_0 C^{\textrm{rule}} \times_1 \balpha(\textrm{tail}_0)) / g$
+			\end{itemize}
+}}
+\end{footnotesize}
+\caption{The tensor form of the hypergraph inside-outside algorithm, for calculation of rule marginals $\mu(e)$.  
+A slight simplification in the marginal computation yields NT marginals for spans $\mu(\bX, i, j)$.
+{\bf B}(q) returns the incoming hyperedges for node $q$, and {\bf h}(e), {\bf t}(e), {\bf r}(e) return the head node, tail nodes, and rule for hyperedge $e$.} 
+\vspace{-1cm}
+\label{fig:hg_io_spec}
+\end{figure}
+For a parameter $t$ of rule $r$, the latent state $h_1$ attached to the LHS NT of $r$ is associated with the outside tree for the sub-tree rooted at the LHS, and the states attached to the RHS NTs are associated with the inside trees of that NT.    
+Since we do not assume conditional independence of these states, we need to consider all possible interactions, which can be compactly represented as a 3\ts{rd}-order tensor in the case of a binary rule, a matrix (i.e., a 2\ts{nd}-order tensor) for unary rules, and a vector for pre-terminal (lexical) rules.  
+Preferences for certain outside-inside tree combinations are reflected in the values contained in these tensor structures.  
+In this manner, we intend to capture interactions between non-local context, as represented by the outside tree, and local context, through the inside trees. 
+We refer to these tensor structures collectively as $C^r$ for rules $r \in \rules$, which encompass the parameters $t$.  
+
+For $r \in \rules_0: C^r \in \reals^{1 \times m}$; similarly for $r \in \rules_1: C^r \in \reals^{m \times m}$ and $r \in \rules_2: C^r \in \reals^{m \times m \times m}$.
+We also maintain a vector $C^\bS \in \reals^{m \times 1}$ corresponding to the parameters $\pi(\bS, h)$ for the goal node (root).   
+These parameters participate in tensor-vector operations: a 3\ts{rd}-order tensor $C^{r_2}$ can be multiplied along each of its three modes ($\times_0, \times_1, \times_2$), and if multiplied by an $m \times 1$ vector, will produce an $m \times m$ matrix.\footnote{This operation is sometimes called a contraction.}
+Note that matrix multiplication can be represented by $\times_1$ when multiplying on the right and $\times_0$ when multiplying on the left of the matrix.  
+
+The decoder computes probabilities for each rule in the parse forest of a source sentence by marginalizing over the latent states, which in practice corresponds to simple tensor-vector products, and is not dependent on the manner in which the parameters were estimated. 
+Figure \ref{fig:hg_io_spec} presents the tensor version of the inside-outside algorithm for decoding L-SCFGs. 
+The algorithm takes as input the parse forest of the source sentence represented as a hypergraph \cite{Klein2001}, which is computed using a bottom-up parser with Earley-style rules (citation), similar to the CKY+ algorithm used in \newcite{Chiang2007}.  
+Then, the algorithm computes inside and outside probabilities over the hypergraph using the tensor representations, and converts these probabilities to marginal rule probabilities.  
+It is similar to the version presented in \newcite{Cohen2012a}, but adapted to hypergraph parse forests. 
+ 
+The algorithm maintains its $\mathcal{O}(n^3|G|)$ complexity where $n$ is the length of the input sentence and $|G|$ is the size of the grammar; we do not increase the number of rules at all, so the grammar size is the same. 
+But of course, there is no free lunch, and the additional computation gets shifted to the marginalization over latent states via the algorithm in Figure \ref{fig:hg_io_spec}.  
+However, the bulk of the computation in this case is in the form of a series of tensor-vector products of relatively small size (each dimension is of length $m$), which can be computed very quickly and in parallel.  
+
+\section{Parameter Estimation for L-SCFGs}
+\label{sec:estimation}
+We explore two methods for estimating the parameters $C^r$ of the model: a likelihood-maximization approach based on EM \cite{Dempster1977}, and a spectral approach based on the method of moments \cite{Hsu2009}, where we identify a subspace using a singular value decomposition (SVD) \cite{Golub1996} of the cross-product feature space between inside and outside trees and estimate parameters in this subspace. 
+
+Figure \ref{fig:estimation-algos} presents a side-by-side comparison of the two algorithms, which we discuss in this section.  
+In the spectral approach, we base our parameter estimates on low-rank representations of moments of features, while EM explicitly maximizes a likelihood criterion. 
+The parameter estimation algorithms are relatively similar, but in lieu of sparse feature functions in the spectral case, EM used partial counts estimated with the current set of parameters.  
+The nature of EM allows it to be susceptible to local optima, while the spectral approach comes with guarantees on obtaining the global optimum. 
+Lastly, computing the SVD and estimating parameters in the low-rank space is a one-shot operation, as opposed to the iterative procedure of EM. 
+
+\begin{figure*}[t!]
+	\centering
+	\fbox{
+	\begin{footnotesize}		
+	\begin{subfigure}{0.85\columnwidth}
+	\vspace{-1cm}
+	{\bf Inputs:} 
+	
+	Training examples $(\rione, \xione, \xitwo, \xithree, \oi, b^{(i)})$ for $i \in \{1 \ldots M\}$, where $\rione$ is a context free rule; $\xione$, $\xitwo$, and $\xithree$ are inside trees; $\oi$ is an outside tree; and $b^{(i)} = 1$ if the rule is at the root of tree, $0$ otherwise.
+A function $\phi$ that maps inside trees $t$ to feature-vectors $\phi(t) \in \reals^d$. A function $\psi$ that maps outside trees $o$ to feature-vectors $\psi(o) \in \reals^{d'}$.
+
+	{\bf Algorithm:}
+	%If $\rione$ is of the form $\srule$, define $b_i$ to be the non-terminal for the left-child of $\rione$, and $c_i$ to be the non-terminal for the right-child.
+
+	(Step 0: Singular Value Decomposition)
+	\begin{itemize}
+		\item Compute the SVD of Eq.~\ref{eq:outerproduct} to calculate matrices $\e{U} \in \reals^{(d \times m)}$ and $\e{V} \in \reals^{(d' \times m)}$.
+	\end{itemize}
+
+	(Step 1: Projection) 
+	\begin{align*}
+		Y(t) &= U^T \phi(t)\\
+		Z(o) &= \Sigma^{-1} V^T \psi(o)
+	\end{align*}
+
+	(Step 2: Calculate Correlations)
+	\begin{align*}
+		\e{E}^r &= \begin{cases}
+			\frac{\sum_{o \in Q^r} Z(o)}{|Q^r|} &  \textrm{if }r \in \rules_0 \\
+			\frac{\sum_{\left(o, t\right) \in Q^r} Z(o) \otimes Y(t)}{|Q^r|} & \textrm{if }r \in \rules_1 \\
+			\frac{\sum_{\left(o, t^2, t^3\right) \in Q^r} Z(o) \otimes Y(t^2) \otimes Y(t^3)}{|Q^r|} & \textrm{if }r \in \rules_2 
+		\end{cases}
+	\end{align*}
+	$Q^r$ is the set of outside-inside tree triples for binary rules, outside-inside tree pairs for unary rules, and outside trees for pre-terminals.
+
+	(Step 3: Compute Final Parameters)
+	\begin{itemize}
+		\item For all $r \in \rules$, 
+			\begin{itemize}[label={}]
+				\item $\e{C}^r = \frac{\textrm{count}(r)}{M} \times \e{E}^r$
+			\end{itemize}
+		\item For all $\rione \in \{1, \dots, M\}$ such that $b^{(i)}$ is 1, 
+		\begin{itemize}[label={}]
+			\item $\e{C}^\bS =  \e{C}^\bS + \frac{Y(\xione)}{|Q^\bS|} $
+		\end{itemize}						
+	\end{itemize}
+	$Q^\bS$ is the set of trees at the root.  
+	\caption{\small The spectral learning algorithm for estimating parameters of an L-SCFG.}
+	\label{fig:splearn}
+	\end{subfigure}
+	%&
+	\begin{subfigure}{1.05\columnwidth}
+	{\bf Inputs:} 
+	
+	Training examples $(\rione, \xione, \xitwo, \xithree, \oi, b^{(i)})$ for $i \in \{1 \ldots M\}$, where $\rione$ is a context free rule; $\xione$, $\xitwo$, and $\xithree$ are inside trees; $\oi$ is an outside tree; $b^{(i)} = 1$ if the rule is at the root of tree, $0$ otherwise; and MAX\_ITERATIONS.
+%A function $\phi$ that maps inside trees $t$ to feature-vectors $\phi(t) \in \reals^d$. A function $\psi$ that maps outside trees $o$ to feature-vectors $\psi(o) \in \reals^{d'}$.
+
+	{\bf Algorithm:}
+	%If $\rione$ is of the form $\srule$, define $b_i$ to be the non-terminal for the left-child of $\rione$, and $c_i$ to be the non-terminal for the right-child.
+
+	(Step 0: Parameter Initialization)
+	
+	For rule $r \in \rules$,
+	\begin{itemize}[noitemsep]
+		\item if $r \in \rules_0$: initialize $\e{C}^r \in \reals^{1 \times m}$ 
+		\item if $r \in \rules_1$: initialize $\e{C}^r \reals^{m \times m}$ 
+		\item if $r \in \rules_2$: initialize $\e{C}^r \reals^{m \times m \times m}$ 
+	\end{itemize}
+	
+	Initialize $\e{C}^\bS \in \reals^{m \times 1}$ 
+	
+	$\e{C}_0^r = \e{C}^r, \e{C}_0^\bS = \e{C}^\bS$
+	
+	For iteration $t=1, \dots, \textrm{MAX\_ITERATIONS}$, 
+	\begin{itemize}			
+		\item Expectation Step: 
+		 \begin{itemize}[label={}]
+			\item (Estimate $Y$ and $Z$) 
+			
+			Compute partial counts and total tree probabilities $g$ for all $t$ and $o$ using Fig.~\ref{fig:hg_io_spec} and parameters $\e{C}_{t-1}^r, \e{C}_{t-1}^\bS$.  
+			\item (Calculate Correlations) 
+				\begin{align*}
+					\e{E}^r &= \begin{cases}
+					\sum\limits_{o, g \in Q^r} \frac{Z(o)}{g} &\textrm{if }r \in \rules_0 \\
+					\sum\limits_{\left(o, t, g\right) \in Q^r} \frac{Z(o) \otimes Y(t)}{g} &\textrm{if }r \in \rules_1 \\
+					\sum\limits_{\left(o,t^2,t^3,g\right) \in Q^r} \frac{Z(o) \otimes Y(t^2) \otimes Y(t^3)}{g} &\textrm{if }r \in \rules_2 
+					\end{cases}
+				\end{align*}
+			\item (Update Parameters)
+		 	\begin{itemize}[label={}]
+		 		\item For all $r \in \rules$, $\e{C}^r_t = \e{C}^r_{t-1} \odot \e{E}^r$
+		 		\item For all $\rione \in \{1, \dots, M\}$ such that $b^{(i)}$ is 1, $\e{C}^\bS_t = \e{C}^\bS_t + (\e{C}^\bS_{t-1} \odot Y(\rione)) / g $
+		 	\end{itemize}
+		 	$Q^\bS$ is the set of trees at the root.
+		\end{itemize}
+		\item Maximization Step
+			\begin{itemize}[label={},nolistsep]%[nolistsep]
+				\item if $r \in \rules_0$: $\forall h_1: \e{C}^r(h_1) = \frac{\e{C}^r(h_1)}{\sum_{h_1}\e{C}^r(h_1)}$ 
+				\item if $r \in \rules_1$: $\forall h_1, h_2: \e{C}^r(h_1, h_2) = \frac{\e{C}^r(h_1, h_2)}{\sum_{h_2}\e{C}^r(h_1, h_2)}$ 
+				\item if $r \in \rules_2$: $\forall h_1, h_2, h_3: \e{C}^r(h_1, h_2, h_3) = \frac{\e{C}^r(h_1, h_2, h_3)}{\sum_{h_2, h_3}\e{C}^r(h_1, h_2, h_3)}$ 
+			\end{itemize}
+	\end{itemize}		
+	\caption{\small The EM-based algorithm for estimating parameters of an L-SCFG.}
+	\label{fig:emlearn}	
+	\end{subfigure}		
+	\end{footnotesize}}
+	\caption{The two parameter estimation algorithms proposed for L-SCFGs.}
+	\label{fig:estimation-algos}
+\end{figure*}
+
+\subsection{Spectral Moments-based Estimation}
+\label{sec:spectral} 
+We generalize the parameter estimation algorithm presented in \newcite{Cohen2013} to the synchronous or bilingual case. 
+The central concept of the spectral parameter estimation algorithm is to learn an $m$-dimensional representation of inside and outside trees by defining these trees in terms of features, in combination with a projection step (SVD), with the hope being that the lower-dimensional space captures the syntactic and semantic regularities among rules from the sparse feature space. 
+%The spectral method relies on computing the empirical covariances between two feature spaces, represented by their respective feature functions that map tree fragments to feature vectors.  
+Every NT in an s-tree has an associated inside and outside tree; the inside tree contains the entire sub-tree at and below the NT, and the outside tree is everything else in the synchronous s-tree except the inside tree.   
+The inside feature function $\phi \in \mathbb{R}^d$ maps the domain of inside tree fragments to a $d$-dimensional Euclidean space, and the outside feature function $\psi \in \mathbb{R}^{d'}$ maps the domain of outside tree fragments to a $d'$-dimensional space. 
+The specific features we used are discussed in \S\ref{sec:features}.  
+
+Let $\mathcal{O}$ be the set of all tuples of inside-outside trees in our training corpus, whose size is equivalent to the number of rule tokens $M$, and let $\phi(t) \in \reals^{d \times 1}$, $\psi(o) \in \reals^{d' \times 1}$ be the inside and outside feature functions. 
+By computing the outer product $\otimes$ between the inside and outside feature vectors for each pair and aggregating, we obtain the empirical inside-outside feature covariance matrix:
+\begin{align}
+	\hat{\Omega} = \frac{1}{|\mathcal{O}|} \sum_{(o,t) \in \mathcal{O}} \phi(t) \left(\psi(o)\right)^T
+	\label{eq:outerproduct}
+\end{align}
+If $m$ is the desired latent space dimension, we compute an $m$-rank truncated SVD of the empirical covariance matrix $\hat{\Omega} = U \Sigma V^T$, where $U \in \mathbb{R}^{d \times m}$ and $V \in \mathbb{R}^{d' \times m}$ are the matrices containing the left and right singular vectors, and $\Sigma \in \mathbb{R}^{d \times d'}$ is a diagonal matrix containing the $m$-largest singular values along its diagonal.  
+
+Figure \ref{fig:splearn} provides the remaining steps in the algorithm.  
+In step 1, for each inside and outside tree, we project its high-dimensional representation to the latent space.  
+Using the lower-dimensional representations for inside and outside trees, in step 2 for each rule type $r$ we compute the covariance between the inside tree vectors and the outside tree vector using the \emph{tensor product}, a generalized outer product to compute covariances between more than two random vectors.  
+For binary rules, with two child inside vectors and one outside vector, the result $\e{E}^r$ is a 3-mode tensor; for unary rules, a regular matrix, and for pre-terminal rules with no right-hand side non-terminals, a vector. 
+The final parameter estimate is then the associated tensor/matrix/vector, scaled by the maximum likelihood estimate of the rule $r$, as in step 3.  
+
+The corresponding theoretical guarantees from \newcite{Cohen2012a} can also be generalized to the synchronous case trivially.  
+$\hat{\Omega}$ is an empirical estimate of the true covariance matrix $\Omega$, and if $\Omega$ has rank $m$, then the marginals computed using the spectrally-estimated parameters will converge to the true marginals.  
+The sample complexity for convergence is inversely proportional to the $m^{\textrm{th}}$ largest singular value.   
+
+\subsection{EM-based Estimation}
+\label{sec:em}
+A likelihood maximization approach can also be used to learn the parameters of an L-SCFG.  
+Parameters are initialized by sampling each parameter value $\e{C}^r(h_1, h_2, h_3)$ from the interval $[0,1]$ uniformly at random.\footnote{In our experiments, we also tried the initialization scheme described in \newcite{Matsuzaki2005}, but found that it provided little benefit.}
+We first decode the training corpus using an existing set of parameters to compute the inside and outside probability vectors associated with NTs for every rule in each s-tree, constrained to the tree structure of the training example. 
+These probabilities can be computed using the decoding algorithm in Figure \ref{fig:hg_io_spec} (where $\balpha$ and $\bbeta$ correspond to the inside and outside probabilities respectively), except the parse forest consists of a single tree only. 
+Each of these vectors represents partial counts over latent states.  
+We can then define functions $Y$ and $Z$ (analogous to the spectral case) which map inside and outside tree instances to $m$-dimensional vectors containing these partial counts. 
+In the spectral case, $Y$ and $Z$ are estimated just once, while in the case of EM they have to be re-estimated at each iteration.
+
+The expectation step thus consists of computing the partial counts of inside and outside trees $t$ and $o$, i.e., recovering the functions $Y$ and $Z$, and updating parameters $C^r$ by computing correlations, which involves summing over partial counts (across all occurrences of a rule in the corpus). 
+Each partial count's contribution is divided by a normalization factor $g$, which is the total probability of the tree which $t$ or $o$ is part of.  
+Note that unlike the spectral case, there is a specific normalization factor for each inside-outside tuple. 
+Lastly, the correlations are scaled by the existing parameter estimates.
+To obtain the next set of parameters, in the maximization step we normalize $\e{C}^r$ for $r \in \rules$ such that for every $h_1, \sum_{h_2,h_3} \e{C}^r(h_1, h_2, h_3) = 1$ for $r \in \rules_2$, $\sum_{h_2} \e{C}^r(h_1, h_2) = 1$ for $r \in \rules_1$, and $\sum_{h_2} \e{C}^r(h_2) = 1$ for $r \in \rules_0$.  
+We note that it is also possible to add sparse, overlapping features to an EM-based estimation procedure \cite{Berg-Kirkpatrick2010} and leave this for future work.  
+
+\section{Evaluation}
+To evaluate the performance of L-SCFGs in a translation setting, we looked at several experiments across two language pairs.  
+The primary criterion of evaluation was BLEU \cite{Papineni2002}, and we evaluate our latent variable model against a number of baselines to elucidate its performance.  
+The latent variable model is integrated into the standard MT pipeline by computing marginal probabilities for each rule in the parse forest of a source sentence using the algorithm in Figure \ref{fig:hg_io_spec} with the parameters estimated through the algorithms in Figure \ref{fig:estimation-algos}, and is added as a feature for the rule during MERT \cite{Och2003}.  
+These probabilities are conditioned on the LHS (\bX), and are thus joint probabilities for a source-target RHS pair.  
+We also write out as features the conditional probabilities $P(e|f)$ and $P(f|e)$ as estimated by our latent variable model, i.e., conditioned on the source and target RHS.  
+
+\subsection{Data and Baselines}
+\label{sec:data}
+The \textsc{DE-EN} parallel corpus is taken from the news commentary section of the WMT 2012 translation evaluation; \textsc{news-test2010} is used as the development set, and \textsc{news-test2011} is the test set.\footnote{http://www.statmt.org/wmt12/}
+The development and test sets are evaluated with a single reference.    
+The \textsc{ZH-EN} data is the BTEC parallel corpus \cite{Paul2009}; we combine the first and second development sets in one, and evaluate on the third development set.   
+The development and test sets are evaluated with 16 references.  
+Statistics for the data are shown in Table \ref{tab:corpusstats}.  
+We used the \textsc{cdec} decoder \cite{Dyer2010} to extract word alignments and the baseline hierarchical grammars, for MERT tuning, and decoding.  
+%For the in-sample conditional perplexity experiments, we used a 4-gram language model .  
+We used a 4-gram language model built from the target-side of the parallel training data.  
+\begin{table}[h!]
+%{\small
+  \begin{center}
+    \begin{tabular}{p{0.5\linewidth}rr}
+      \hline
+      & \textsc{DE-EN} & \textsc{ZH-EN} \\
+	  \hline
+      TRAIN (SRC) & 3.7M & 334K \\
+	  TRAIN (TGT) & 3.6M &  366K \\
+	  DEV (SRC) & 65K & 7K \\
+      DEV (TGT) & 63K &  7.6K\\
+	  TEST (SRC) & 63K &  3.8K \\
+	  TEST (TGT) & 65K & 3.9K \\
+	\end{tabular}
+  \end{center}
+  \caption{Corpus statistics (in words).  For the \textsc{ZH-EN} target DEV and TEST statistics, we take the first reference.}
+  \label{tab:corpusstats}
+  %}
+\end{table}
+
+The baseline \textsc{hiero} system uses a grammar extracted by applying the commonly used heuristics \cite{Chiang2005}.  
+Each rule is decorated with two lexical and phrasal features corresponding to the forward $P(e|f)$ and backward $P(f|e)$ probabilities, along with the joint probability $P(e,f)$, the marginal probability of the source phrase $P(f)$, and whether the phrase pair or the source phrase is a singleton. 
+Weights for the language model (and language model OOV), glue rule, and word penalty are also tuned. 
+The minimal grammar maintains the same set of weights. 
+
+\subsection{Features}
+\label{sec:features}
+We use the following set of sparse, binary features in the spectral learning process:
+\begin{itemize}[noitemsep]
+	\item Rule Indicator: for the inside features, we consider the rule production containing the current non-terminal on the left-hand side, as well as the rules of the children (distinguishing between left and right children for binary rules).  
+	For the outside features, we consider the parent rule production along with the rule production of the sibling (if it exists). 
+	\item Lexical: for both the inside and outside features, any lexical items that appear in the rule productions are recorded.  
+	Furthermore, we consider the first and last words of spans (left and right child spans for inside features, distinguishing between the two if both exist, and sibling span for outside features).  
+	Source and target words are treated separately. 
+	%\item Arity: the number of non-terminals present in inside tree and outside tree rules.  
+	\item Length: the span length of the tree and each of its children for inside features, and the span length of the parent and sibling for outside features. 	
+\end{itemize}
+In addition to the sparse features, we also investigate the inclusion of real-valued features that are traditionally used in MT, e.g., lexical and phrasal forward and reverse probabilities. 
+
+\subsection{\textsc{DE-EN} Experiments}
+
+Table \ref{tab:de-en-results} presents a comprehensive evaluation of the \textsc{DE-EN} experimental setup.  
+The first section consists of the various baselines we consider. 
+In addition to the standard HPBT setup \cite{Chiang2005}, we evaluate the minimal grammar baseline with the same set of features, as well as a setup where the spectral parameters simply consist of the joint maximum likelihood estimates of the rules.  
+This baseline, along with the $m=1$ spectral baseline with only rule indicator features, should perform \emph{en par} with the minimal grammar baseline, which we see is the case.  
+Furthermore, in line with previous work \cite{Galley2006} which compares minimal and composed rules, we find that minimal grammars take a hit of almost 1.5 BLEU points compared to composed (\textsc{hiero}) grammars.  
+
+We look at a number of feature combinations and latent states for the spectral and EM-estimated latent variable models.  
+
+The two estimation algorithms differ significantly in their estimation time.  
+The spectral algorithm is an at least an order of magnitude faster: it completes within 40 minutes on a single core, while a parallelized EM implementation would take around 100 iterations to achieve this level of performance, taking more than 10 hours.  
+
+\begin{table}[t!]
+\begin{small}
+  \begin{center}
+    \begin{tabular}{|l|p{0.45\columnwidth}rr|}
+      \hline
+	  & & \multicolumn{2}{c|}{\bf BLEU} \\
+      & Setup & Dev & Test \\
+	  \hline
+	  \multirow{3}{*}{Baselines} & \textsc{hiero} & 18.50 & 16.89 \\
+      & Minimal Grammar & 17.01 & 15.42 \\
+	  & MLE & X & Y \\ \hline
+	  \multirow{4}{*}{Spectral} &  $m=1$ RI & 17.09 & 15.34 \\
+	  & $m=1$ RI+Lex+Len & X & Y \\
+	  & $m=16$ RI+Lex+Len & X & Y \\
+	  & $m=16$ RI+Lex+Len+Sm & X & Y \\ \hline
+	  \multirow{2}{*}{EM} & $m=1$ 100 Iter & X & Y \\
+	  & $m=16$ 100 Iter & X & Y \\
+	  \hline
+	\end{tabular}
+  \end{center}
+  \caption{Results for the \textsc{DE-EN} corpus, comparing across the baselines and the two parameter estimation techniques.
+  RI, Lex, and Len correspond to the rule indicator, lexical, and length features respectively, and Sm denotes smoothing.}
+  \label{tab:de-en-results}
+\end{small}
+\end{table}
+\subsection{\textsc{ZH-EN} Experiments}
+
+\subsection{Discussion \& Analysis}
+
+\section{Related Work}
+
+\section{Conclusion}
+
+In this work, we presented a scalable approach to refine synchronous grammars used in MT by inferring the latent categories for each non-terminal in our grammar rules.
+
+For future work, we would like to consider a more direct way to integrate the latent variable parameters in an MT setup.  
+
+% include your own bib file like this:
+\bibliographystyle{acl}
+\bibliography{bibliography}
+
+\end{document}
diff --git a/EMNLP2014/spectral_scfgs.tex b/EMNLP2014/spectral_scfgs.tex
deleted file mode 100644
index 5862361..0000000
--- a/EMNLP2014/spectral_scfgs.tex
+++ /dev/null
@@ -1,431 +0,0 @@
-%
-% File acl2014.tex
-%
-% Contact: koller@ling.uni-potsdam.de, yusuke@nii.ac.jp
-%%
-%% Based on the style files for ACL-2013, which were, in turn,
-%% Based on the style files for ACL-2012, which were, in turn,
-%% based on the style files for ACL-2011, which were, in turn, 
-%% based on the style files for ACL-2010, which were, in turn, 
-%% based on the style files for ACL-IJCNLP-2009, which were, in turn,
-%% based on the style files for EACL-2009 and IJCNLP-2008...
-
-%% Based on the style files for EACL 2006 by 
-%%e.agirre@ehu.es or Sergi.Balari@uab.es
-%% and that of ACL 08 by Joakim Nivre and Noah Smith
-
-\documentclass[11pt]{article}
-\usepackage{acl2014}
-\usepackage{times}
-\usepackage{url}
-\usepackage{latexsym}
-
-%\setlength\titlebox{5cm}
-
-% You can expand the titlebox if you need extra space
-% to show all the authors. Please do not make the titlebox
-% smaller than 5cm (the original size); we will check this
-% in the camera-ready version and ask you to change it back.
-
-
-\title{Instructions for ACL-2014 Proceedings}
-
-\author{First Author \\
-  Affiliation / Address line 1 \\
-  Affiliation / Address line 2 \\
-  Affiliation / Address line 3 \\
-  {\tt email@domain} \\\And
-  Second Author \\
-  Affiliation / Address line 1 \\
-  Affiliation / Address line 2 \\
-  Affiliation / Address line 3 \\
-  {\tt email@domain} \\}
-
-\date{}
-
-\begin{document}
-\maketitle
-\begin{abstract}
-  This document contains the instructions for preparing a camera-ready
-  manuscript for the proceedings of ACL-2014. The document itself
-  conforms to its own specifications, and is therefore an example of
-  what your manuscript should look like. These instructions should be
-  used for both papers submitted for review and for final versions of
-  accepted papers.  Authors are asked to conform to all the directions
-  reported in this document.
-\end{abstract}
-
-\section{Credits}
-
-This document has been adapted from the instructions for earlier ACL
-proceedings, including those for ACL-2012 by Maggie Li and Michael
-White, those from ACL-2010 by Jing-Shing Chang and Philipp Koehn,
-those for ACL-2008 by Johanna D. Moore, Simone Teufel, James Allan,
-and Sadaoki Furui, those for ACL-2005 by Hwee Tou Ng and Kemal
-Oflazer, those for ACL-2002 by Eugene Charniak and Dekang Lin, and
-earlier ACL and EACL formats. Those versions were written by several
-people, including John Chen, Henry S. Thompson and Donald
-Walker. Additional elements were taken from the formatting
-instructions of the {\em International Joint Conference on Artificial
-  Intelligence}.
-
-\section{Introduction}
-
-The following instructions are directed to authors of papers submitted
-to ACL-2014 or accepted for publication in its proceedings. All
-authors are required to adhere to these specifications. Authors are
-required to provide a Portable Document Format (PDF) version of their
-papers. \textbf{The proceedings are designed for printing on A4
-  paper.}
-
-Authors from countries in which access to word-processing systems is
-limited should contact the publication chairs, Alexander Koller
-(\texttt{koller@ling.uni-potsdam.de}) and Yusuke Miyao
-(\texttt{yusuke@nii.ac.jp}), as soon as possible.
-
-We will make more detailed instructions available at
-\url{http://sites.google.com/site/acl2014publication}. Please check
-this website regularly.
-
-
-\section{General Instructions}
-
-Manuscripts must be in two-column format.  Exceptions to the
-two-column format include the title, authors' names and complete
-addresses, which must be centered at the top of the first page, and
-any full-width figures or tables (see the guidelines in
-Subsection~\ref{ssec:first}). {\bf Type single-spaced.}  Start all
-pages directly under the top margin. See the guidelines later
-regarding formatting the first page.  The manuscript should be
-printed single-sided and its length
-should not exceed the maximum page limit described in Section~\ref{sec:length}.
-Do not number the pages.
-
-
-\subsection{Electronically-available resources}
-
-We strongly prefer that you prepare your PDF files using \LaTeX\ with
-the official ACL 2014 style file (acl2014.sty) and bibliography style
-(acl.bst). These files are available at
-\url{http://www.cs.jhu.edu/ACL2014/}. You will also find the document
-you are currently reading (acl2014.pdf) and its \LaTeX\ source code
-(acl2014.tex) on this website.
-
-You can alternatively use Microsoft Word to produce your PDF file. In
-this case, we strongly recommend the use of the Word template file
-(acl2014.dot) on the ACL 2014 website. If you have an option, we
-recommend that you use the \LaTeX2e version. If you will be
-  using the Microsoft Word template, we suggest that you anonymize
-  your source file so that the pdf produced does not retain your
-  identity.  This can be done by removing any personal information
-from your source document properties.
-
-
-
-\subsection{Format of Electronic Manuscript}
-\label{sect:pdf}
-
-For the production of the electronic manuscript you must use Adobe's
-Portable Document Format (PDF). PDF files are usually produced from
-\LaTeX\ using the \textit{pdflatex} command. If your version of
-\LaTeX\ produces Postscript files, you can convert these into PDF
-using \textit{ps2pdf} or \textit{dvipdf}. On Windows, you can also use
-Adobe Distiller to generate PDF.
-
-Please make sure that your PDF file includes all the necessary fonts
-(especially tree diagrams, symbols, and fonts with Asian
-characters). When you print or create the PDF file, there is usually
-an option in your printer setup to include none, all or just
-non-standard fonts.  Please make sure that you select the option of
-including ALL the fonts. \textbf{Before sending it, test your PDF by
-  printing it from a computer different from the one where it was
-  created.} Moreover, some word processors may generate very large PDF
-files, where each page is rendered as an image. Such images may
-reproduce poorly. In this case, try alternative ways to obtain the
-PDF. One way on some systems is to install a driver for a postscript
-printer, send your document to the printer specifying ``Output to a
-file'', then convert the file to PDF.
-
-It is of utmost importance to specify the \textbf{A4 format} (21 cm
-x 29.7 cm) when formatting the paper. When working with
-{\tt dvips}, for instance, one should specify {\tt -t a4}.
-
-Print-outs of the PDF file on A4 paper should be identical to the
-hardcopy version. If you cannot meet the above requirements about the
-production of your electronic submission, please contact the
-publication chairs as soon as possible.
-
-
-\subsection{Layout}
-\label{ssec:layout}
-
-Format manuscripts two columns to a page, in the manner these
-instructions are formatted. The exact dimensions for a page on A4
-paper are:
-
-\begin{itemize}
-\item Left and right margins: 2.5 cm
-\item Top margin: 2.5 cm
-\item Bottom margin: 2.5 cm
-\item Column width: 7.7 cm
-\item Column height: 24.7 cm
-\item Gap between columns: 0.6 cm
-\end{itemize}
-
-\noindent Papers should not be submitted on any other paper size.
- If you cannot meet the above requirements about the production of your electronic submission, please contact the publication chairs above as soon as possible.
-
-
-\subsection{Fonts}
-
-For reasons of uniformity, Adobe's {\bf Times Roman} font should be
-used. In \LaTeX2e{} this is accomplished by putting
-
-\begin{quote}
-\begin{verbatim}
-\usepackage{times}
-\usepackage{latexsym}
-\end{verbatim}
-\end{quote}
-in the preamble. If Times Roman is unavailable, use {\bf Computer
-  Modern Roman} (\LaTeX2e{}'s default).  Note that the latter is about
-  10\% less dense than Adobe's Times Roman font.
-
-
-\begin{table}[h]
-\begin{center}
-\begin{tabular}{|l|rl|}
-\hline \bf Type of Text & \bf Font Size & \bf Style \\ \hline
-paper title & 15 pt & bold \\
-author names & 12 pt & bold \\
-author affiliation & 12 pt & \\
-the word ``Abstract'' & 12 pt & bold \\
-section titles & 12 pt & bold \\
-document text & 11 pt  &\\
-captions & 11 pt & \\
-abstract text & 10 pt & \\
-bibliography & 10 pt & \\
-footnotes & 9 pt & \\
-\hline
-\end{tabular}
-\end{center}
-\caption{\label{font-table} Font guide. }
-\end{table}
-
-\subsection{The First Page}
-\label{ssec:first}
-
-Center the title, author's name(s) and affiliation(s) across both
-columns. Do not use footnotes for affiliations. Do not include the
-paper ID number assigned during the submission process. Use the
-two-column format only when you begin the abstract.
-
-{\bf Title}: Place the title centered at the top of the first page, in
-a 15-point bold font. (For a complete guide to font sizes and styles,
-see Table~\ref{font-table}) Long titles should be typed on two lines
-without a blank line intervening. Approximately, put the title at 2.5
-cm from the top of the page, followed by a blank line, then the
-author's names(s), and the affiliation on the following line. Do not
-use only initials for given names (middle initials are allowed). Do
-not format surnames in all capitals (e.g., use ``Schlangen'' not
-``SCHLANGEN'').  Do not format title and section headings in all
-capitals as well except for proper names (such as ``BLEU'') that are
-conventionally in all capitals.  The affiliation should contain the
-author's complete address, and if possible, an electronic mail
-address. Start the body of the first page 7.5 cm from the top of the
-page.
-
-The title, author names and addresses should be completely identical
-to those entered to the electronical paper submission website in order
-to maintain the consistency of author information among all
-publications of the conference. If they are different, the publication
-chairs may resolve the difference without consulting with you; so it
-is in your own interest to double-check that the information is
-consistent.
-
-{\bf Abstract}: Type the abstract at the beginning of the first
-column. The width of the abstract text should be smaller than the
-width of the columns for the text in the body of the paper by about
-0.6 cm on each side. Center the word {\bf Abstract} in a 12 point bold
-font above the body of the abstract. The abstract should be a concise
-summary of the general thesis and conclusions of the paper. It should
-be no longer than 200 words. The abstract text should be in 10 point font.
-
-{\bf Text}: Begin typing the main body of the text immediately after
-the abstract, observing the two-column format as shown in 
-the present document. Do not include page numbers.
-
-{\bf Indent} when starting a new paragraph. Use 11 points for text and 
-subsection headings, 12 points for section headings and 15 points for
-the title. 
-
-\subsection{Sections}
-
-{\bf Headings}: Type and label section and subsection headings in the
-style shown on the present document.  Use numbered sections (Arabic
-numerals) in order to facilitate cross references. Number subsections
-with the section number and the subsection number separated by a dot,
-in Arabic numerals. Do not number subsubsections.
-
-{\bf Citations}: Citations within the text appear in parentheses
-as~\cite{Gusfield:97} or, if the author's name appears in the text
-itself, as Gusfield~\shortcite{Gusfield:97}.  Append lowercase letters
-to the year in cases of ambiguity.  Treat double authors as
-in~\cite{Aho:72}, but write as in~\cite{Chandra:81} when more than two
-authors are involved. Collapse multiple citations as
-in~\cite{Gusfield:97,Aho:72}. Also refrain from using full citations
-as sentence constituents. We suggest that instead of
-\begin{quote}
-  ``\cite{Gusfield:97} showed that ...''
-\end{quote}
-you use
-\begin{quote}
-``Gusfield \shortcite{Gusfield:97}   showed that ...''
-\end{quote}
-
-If you are using the provided \LaTeX{} and Bib\TeX{} style files, you
-can use the command \verb|\newcite| to get ``author (year)'' citations.
-
-As reviewing will be double-blind, the submitted version of the papers
-should not include the authors' names and affiliations. Furthermore,
-self-references that reveal the author's identity, e.g.,
-\begin{quote}
-``We previously showed \cite{Gusfield:97} ...''  
-\end{quote}
-should be avoided. Instead, use citations such as 
-\begin{quote}
-``Gusfield \shortcite{Gusfield:97}
-previously showed ... ''
-\end{quote}
-
-\textbf{Please do not use anonymous citations} and do not include
-acknowledgements when submitting your papers. Papers that do not
-conform to these requirements may be rejected without review.
-
-\textbf{References}: Gather the full set of references together under
-the heading {\bf References}; place the section before any Appendices,
-unless they contain references. Arrange the references alphabetically
-by first author, rather than by order of occurrence in the text.
-Provide as complete a citation as possible, using a consistent format,
-such as the one for {\em Computational Linguistics\/} or the one in the 
-{\em Publication Manual of the American 
-Psychological Association\/}~\cite{APA:83}.  Use of full names for
-authors rather than initials is preferred.  A list of abbreviations
-for common computer science journals can be found in the ACM 
-{\em Computing Reviews\/}~\cite{ACM:83}.
-
-The \LaTeX{} and Bib\TeX{} style files provided roughly fit the
-American Psychological Association format, allowing regular citations, 
-short citations and multiple citations as described above.
-
-{\bf Appendices}: Appendices, if any, directly follow the text and the
-references (but see above).  Letter them in sequence and provide an
-informative title: {\bf Appendix A. Title of Appendix}.
-
-\subsection{Footnotes}
-
-{\bf Footnotes}: Put footnotes at the bottom of the page and use 9
-points text. They may be numbered or referred to by asterisks or other
-symbols.\footnote{This is how a footnote should appear.} Footnotes
-should be separated from the text by a line.\footnote{Note the line
-separating the footnotes from the text.}
-
-\subsection{Graphics}
-
-{\bf Illustrations}: Place figures, tables, and photographs in the
-paper near where they are first discussed, rather than at the end, if
-possible.  Wide illustrations may run across both columns.  Color
-illustrations are discouraged, unless you have verified that  
-they will be understandable when printed in black ink.
-
-{\bf Captions}: Provide a caption for every illustration; number each one
-sequentially in the form:  ``Figure 1. Caption of the Figure.'' ``Table 1.
-Caption of the Table.''  Type the captions of the figures and 
-tables below the body, using 11 point text.
-
-
-\section{XML conversion and supported \LaTeX\ packages}
-
-ACL 2014 innovates over earlier years in that we will attempt to
-automatically convert your \LaTeX\ source files to machine-readable
-XML with semantic markup. This will facilitate future research that
-uses the ACL proceedings themselves as a corpus.
-
-We encourage you to submit a ZIP file of your \LaTeX\ sources along
-with the camera-ready version of your paper. We will then convert them
-to XML automatically, using the LaTeXML tool
-(\url{http://dlmf.nist.gov/LaTeXML}). LaTeXML has \emph{bindings} for
-a number of \LaTeX\ packages, including the ACL 2014 stylefile. These
-bindings allow LaTeXML to render the commands from these packages
-correctly in XML. For best results, we encourage you to use the
-packages that are officially supported by LaTeXML, listed at
-\url{http://dlmf.nist.gov/LaTeXML/manual/included.bindings}
-
-
-
-
-
-\section{Translation of non-English Terms}
-
-It is also advised to supplement non-English characters and terms
-with appropriate transliterations and/or translations
-since not all readers understand all such characters and terms.
-Inline transliteration or translation can be represented in
-the order of: original-form transliteration ``translation''.
-
-\section{Length of Submission}
-\label{sec:length}
-
-Long papers may consist of up to 8 pages of content, plus two extra
-pages for references. Short papers may consist of up to 4 pages of
-content, plus two extra pages for references.  Papers that do not
-conform to the specified length and formatting requirements may be
-rejected without review.
-
-
-
-\section*{Acknowledgments}
-
-The acknowledgments should go immediately before the references.  Do
-not number the acknowledgments section. Do not include this section
-when submitting your paper for review.
-
-% include your own bib file like this:
-%\bibliographystyle{acl}
-%\bibliography{acl2014}
-
-\begin{thebibliography}{}
-
-\bibitem[\protect\citename{Aho and Ullman}1972]{Aho:72}
-Alfred~V. Aho and Jeffrey~D. Ullman.
-\newblock 1972.
-\newblock {\em The Theory of Parsing, Translation and Compiling}, volume~1.
-\newblock Prentice-{Hall}, Englewood Cliffs, NJ.
-
-\bibitem[\protect\citename{{American Psychological Association}}1983]{APA:83}
-{American Psychological Association}.
-\newblock 1983.
-\newblock {\em Publications Manual}.
-\newblock American Psychological Association, Washington, DC.
-
-\bibitem[\protect\citename{{Association for Computing Machinery}}1983]{ACM:83}
-{Association for Computing Machinery}.
-\newblock 1983.
-\newblock {\em Computing Reviews}, 24(11):503--512.
-
-\bibitem[\protect\citename{Chandra \bgroup et al.\egroup }1981]{Chandra:81}
-Ashok~K. Chandra, Dexter~C. Kozen, and Larry~J. Stockmeyer.
-\newblock 1981.
-\newblock Alternation.
-\newblock {\em Journal of the Association for Computing Machinery},
-  28(1):114--133.
-
-\bibitem[\protect\citename{Gusfield}1997]{Gusfield:97}
-Dan Gusfield.
-\newblock 1997.
-\newblock {\em Algorithms on Strings, Trees and Sequences}.
-\newblock Cambridge University Press, Cambridge, UK.
-
-\end{thebibliography}
-
-\end{document}