Update readme

ABoltachev · Nov 1, 2012 · 25376ab · 25376ab
1 parent e9955a3
commit 25376ab
Showing 1 changed file with 21 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Language model inference code by Kenneth Heafield (kenlm at kheafield.com)
 
-THE GIT REPOSITORY https://github.com/kpu/kenlm IS WHERE ACTIVE DEVELOPMENT HAPPENS.  IT MAY RETURN SILENTLY WRONG ANSWERS OR BE SILENTLY BINARY-INCOMPATIBLE WITH STABLE RELEASES.  
+I do development in master on https://github.com/kpu/kenlm/.  Normally, it works, but I do not guarantee it will compile, give correct answers, or generate non-broken binary files.  For a more stable release, get http://kheafield.com/code/kenlm.tar.gz .  
 
 The website http://kheafield.com/code/kenlm/ has more documentation.  If you're a decoder developer, please download the latest version from there instead of copying from another decoder.  
 
@@ -15,38 +15,50 @@ Binary format via mmap is supported.  Run `./build_binary` to make one then pass
 ## Platforms
 `murmur_hash.cc` and `bit_packing.hh` perform unaligned reads and writes that make the code architecture-dependent.  
 It has been sucessfully tested on x86\_64, x86, and PPC64.  
-ARM support is reportedly working, at least on the iphone, but I cannot test this. 
+ARM support is reportedly working, at least on the iphone.   
 
 Runs on Linux, OS X, Cygwin, and MinGW.  
 
-Hideo Okuma and Tomoyuki Yoshimura from NICT contributed ports to ARM and MinGW.  Hieu Hoang is working on a native Windows port.  
+Hideo Okuma and Tomoyuki Yoshimura from NICT contributed ports to ARM and MinGW.  
 
+## Compile-time configuration
+There are a number of macros you can set on the g++ command line or in util/have.hh .  
+
+`KENLM_MAX_ORDER` is the maximum order that can be loaded.  This is done to make state an efficient POD rather than a vector.  
+`HAVE_BOOST` enables Boost-style hashing of StringPiece.  This is only needed if you intend to hash StringPiece in your code.  
+`HAVE_ICU` If your code links against ICU, define this to disable the internal StringPiece and replace it with ICU's copy of StringPiece, avoiding naming conflicts.  
+
+ARPA files can be read in compressed format with these options:
+`HAVE_ZLIB` Supports gzip.  Link with -lz.  I have enabled this by default.  
+`HAVE_BZLIB` Supports bzip2.  Link with -lbz2.
+`HAVE_XZLIB` Supports xz.  Link with -llzma.
+Note that these macros impact only `read_compressed.cc` and `read_compressed_test.cc`.  The bjam build system will auto-detect bzip2 and xz support.  
 
 ## Decoder developers
-- I recommend copying the code and distributing it with your decoder.  However, please send improvements upstream as indicated in CONTRIBUTORS.  
+- I recommend copying the code and distributing it with your decoder.  However, please send improvements upstream.  
 
-- It does not depend on Boost or ICU.  If you use ICU, define `HAVE_ICU` in `util/have.hh` (uncomment the line) to avoid a name conflict.  Defining `HAVE_BOOST` will let you hash `StringPiece`.  
+- Omit the lm/filter directory if you do not want the language model filter.  Only that and tests depend on Boost.  
 
-- Most people have zlib.  If you don't want to depend on that, comment out `#define HAVE_ZLIB` in `util/have.hh`.  This will disable loading gzipped ARPA files.  
+- Select the macros you want, listed in the previous section.  
 
 - There are two build systems: compile.sh and Jamroot+Jamfile.  They're pretty simple and are intended to be reimplemented in your build system.  
 
 - Use either the interface in `lm/model.hh` or `lm/virtual_interface.hh`.  Interface documentation is in comments of `lm/virtual_interface.hh` and `lm/model.hh`.  
 
 - There are several possible data structures in `model.hh`.  Use `RecognizeBinary` in `binary_format.hh` to determine which one a user has provided.  You probably already implement feature functions as an abstract virtual base class with several children.  I suggest you co-opt this existing virtual dispatch by templatizing the language model feature implementation on the KenLM model identified by `RecognizeBinary`.  This is the strategy used in Moses and cdec.
 
-- See `lm/config.hh` for tuning options.
-
+- See `lm/config.hh` for run-time tuning options.
 
 ## Contributors
 Contributions to KenLM are welcome.  Please base your contributions on https://github.com/kpu/kenlm and send pull requests (or I might give you commit access).  Downstream copies in Moses and cdec are maintained by overwriting them so do not make changes there.  
 
 ## Python module
+Contributed by Victor Chahuneau.
 
 ### Installation
 
 ```bash
-pip install -e git+https://github.com/vchahun/kenlm.git#egg=kenlm
+pip install -e git+https://github.com/kpu/kenlm.git#egg=kenlm
 ```
 
 ### Basic Usage