diff --git a/tex_lhada/houches2017_lhada.tex b/tex_lhada/houches2017_lhada.tex index 817e3e5..53ede40 100644 --- a/tex_lhada/houches2017_lhada.tex +++ b/tex_lhada/houches2017_lhada.tex @@ -119,7 +119,7 @@ \subsection{Describing the description language}\label{sec:desc} Three extensions to {\sc Lhada} are introduced in {\sc Lhada17}. The first is the backslash line continuation marker described above. The second extension is the {\tt uid} attribute of the {\tt object} block that is defined in the EBNF language description. This attribute, whose name stands for ``universal identifier'', identifies the {\tt object}. In {\sc Lhada17}, object blocks with a {\tt take external} statement include a {\tt uid} and no {\tt apply} or {\tt cut} statement. The {\tt uid} attribute is reserved to this type of {\tt objects}. Finally, we have introduced two aliases for the keyword {\tt object}: {\tt variable} and {\tt collection}. The {\tt object} block can represent several types of entities. Providing the possibility to use a name reflecting the type of entities, {\tt collection} when dealing with a collection of physics object, {\tt variable} when dealing with a single observable, like an event shape variable, should help in writing more intelligible analysis descriptions. The choice of the name is left to the discretion of the analysis description author. -\subsection{Automatised generation of Rivet analysis code} +\subsection{Automated generation of Rivet analysis code} The {\sc Lhada} language will play a role for LHC result reinterpretations only if it is interfaced to commonly used reinterpretation frameworks. The interface can be done in two different ways. The first approach is to interpret the analysis description at run time. The second one, which is adopted here, is to generate code from the description. @@ -136,15 +136,17 @@ \section{The {\sc Lhada2TNM} interpreter} % %{\em placeholder for Harrison's contribution.} -Two key design features of {\sc Lhada} are human readability and analysis framework independence. As noted above, framework independence can be tested by attempting to implement tools that automatically translate analyses described using {\sc Lhada} into analyses that can be executed in different analysis frameworks. Human readability demands that the number of rules and syntactical elements be kept as low as possible. But, since we also demand that {\sc Lhada} be sufficiently expressive to capture the details of LHC analyses, it pays to follow Einstein's advice: ``Everything should be made as simple as possible, but not simpler". In the prototype of the {\sc Lhada2TNM} translator, we have tried to place the burden where it properly belongs, namely, on the translator. For example, it is expected that physicists will write {\sc Lhada} files so that blocks appear in a natural order. However, the {\sc Lhada2TNM} translator does not rely on the order of blocks within a {\sc Lhada} file. The appropriate ordering of blocks is handled by the translator. Given a {\tt cut} block called {\tt signal}, which makes use of another called {\tt preselection}, {\sc Lhada2TNM} places the code for {\tt preselection} before the code for {\tt signal} in the resulting C++ file. +Two key design features of {\sc Lhada} are human readability and analysis framework independence. As noted above, framework independence can be tested by attempting to implement tools that automatically translate analyses described using {\sc Lhada} into analyses that can be executed in different analysis frameworks. Human readability is enhanced by limiting the number of rules and syntactical elements in {\sc Lhada}. But, since we also demand that {\sc Lhada} be sufficiently expressive to capture the details of LHC analyses, it pays to follow Einstein's advice: ``Everything should be made as simple as possible, but not simpler". In the prototype of the {\sc Lhada2TNM} translator, we have tried to place the burden where it properly belongs, namely, on the translator. For example, it is expected that physicists will write {\sc Lhada} files so that blocks appear in a natural order. However, the {\sc Lhada2TNM} translator does not rely on the order of blocks within a {\sc Lhada} file. The appropriate ordering of blocks is handled by the translator. Given a {\tt cut} block called {\tt signal}, which makes use of another called {\tt preselection}, {\sc Lhada2TNM} places the code for {\tt preselection} before the code for {\tt signal} in the resulting C++ file. -Another example of placing the burden on the translator rather than on the author of a {\sc Lhada} file, concerns statements that span multiple lines. Many computer languages have syntactical elements to identify such statements. However, since {\sc Lhada} is a keyword-value language, continuation marks are not needed because it is possible to identify when the value associated with a statement ends. +Another example of placing the burden on the translator rather than on the author of a {\sc Lhada} file, concerns statements that span multiple lines. Many computer languages have syntactical elements to identify such statements. However, since {\sc Lhada} is a keyword-value language, continuation marks are not needed because it is possible to identify when the value associated with a statement ends. In order to determine where a statement ends, {\sc Lhada2TNM} looks ahead one record in the {\sc Lhada} file during translation. -The {\sc Lhada2TNM} translator is a {\tt Python} program that translates a {\sc Lhada} file to a C++ program that can be executed within the {\sc TNM} $n$-tuple analysis framework. The translator extracts all the blocks from a {\sc Lhada} file and places them within a data structure that groups the blocks according to type. The {\tt object} and {\tt cut} blocks are ordered according to their dependencies on other object or cut blocks. It is assumed that a standard, extensible, type is available to model all analysis objects and that an adapter exists to translate input types, e.g., {\sc Delphes}, {\sc ATLAS}, {\sc CMS}, etc., types to the standard type. In order to determine where a statement ends, {\sc Lhada2TNM} looks ahead to the next record in the {\sc Lhada} file during translation. +The {\sc Lhada2TNM} translator is a {\tt Python} program that translates a {\sc Lhada} file to a C++ program that can be executed within the {\sc TNM} $n$-tuple analysis framework. This framework is the analysis component of a tool developed as a generic mapper from CMS analysis data objects to ntuples comprising integers and floats and arrays thereof. Note, however, that the framework depends on {\sc ROOT} only and not on any CMS data structures. {\sc TNM} therefore serves as a generic ntuple-based analysis framework. The {\sc Lhada2TNM} translator extracts all the blocks from a {\sc Lhada} file and places them within a data structure that groups the blocks according to type. The {\tt object} and {\tt cut} blocks are ordered according to their dependencies on other object or cut blocks. It is assumed that a standard, extensible, type is available to model all analysis objects and that an adapter exists to translate input types, e.g., {\sc Delphes}, {\sc ATLAS}, {\sc CMS}, etc., types to the standard type. This assumption is not an imposition on the {\sc Lhada} language, but rather is an aid to the writing of translators and, or, interpreters for {\sc Lhada}. One benefit is that the C++ implementations provide a clear separation between the analysis code, viewed as an algorithm applied to instances of a standard type, and the input types. -In the current version, instances of the standard, extensible, type as well as functions are placed in the global namespace of the C++ program so that the object and cut code blocks +In the current version of {\sc Lhada2TNM}, instances of the standard, extensible, type as well as functions are placed in the global namespace of the C++ program so that the object and cut code blocks that need them can access them without the need to pass objects between code blocks. The name of a function defined in {\sc Lhada} is assumed to be identical with that of a function, which, ultimately, will be accessed from an online code repository. However, this assumption can be relaxed if warranted in a later iteration of {\sc Lhada}; for example, the appropriate function can be specified by its {\sc DOI}. While the technical details of the automatic access of -codes from an online repository need to be worked out, we see no insurmountable hurdles. +codes from an online repository need to be worked out, we see no insurmountable hurdles. + +One of the purposes of the standard, extensible, type is to accomodate the reality that different input types can, and do, have different attributes and sometimes identical attributes with different names. For example, the transverse momentum of a particle may be called {\tt PT}, in {\sc Delphes}, while the same attribute may be called {\tt Pt} in other input types. It can be argued that we should try to agree on naming conventions. But, in the real world of particle physics, we cannot even agree on whether the signal strength is to be defined as measured over predicted cross section or the inverse. Trying to enforce naming conventions, at least until such time as {\sc Lhada} has become mainstream, would be decidedly counter-productive. Therefore, the extensible type used by {\sc Lhada2TNM} uses the attribute names of the input types. The attributes are modeled as map between a name (as a string) and a floating point value. \section{Conclusion}