ruby/TODO

TODO
====
* C
  - IMPORTANT:
    + FIX file descriptor overflow. See Tickets #341 and #343
  - add .. operator to query parser. For example, [100 200] could be written as
    100..200 or 100...201 like in Ruby Ranges
  - remove exception handling from C code. All errors to be handled by return
    values.
  - Move to sqlite's locking model. Ferret should work fine in a multi-process
    environment.
  - Add optional logging. To be enabled at compilation time, perhaps?
  - Add support for changing zlib and bzlib compression parameters
  - Improve unit test coverage to 100%
  - Add benchmark suite
  - Add Rakefile for development purposes
    + task to publish gcov and benchmark results to ferret wiki
  - Index rebuilding of old versioned indexes.
  - Add a globally accessable, threadsafe symbol table. This will be very
    useful for storing field names so that no objects need to strdup the
    field-names but can just store the symbol representative instead.
    + this has been done but it can be improved using actual Symbol structs
      instead of plain char*
  - Make threading optional at compile time
  - to_json should limit output to prevent memory overflow on large indexes.
    Perhaps we could use some type of buffered read for this.
  - Make BitVector run as fast as bitset from C++ STL. See;
      c/benchmark/bm_bitvector.c
  - Add a symbol table for field names. This will mean that we won't need to
    worry about mallocing and freeing field names which happens all over the
    place.
  - Divide the headers into public and private (the private headers to be
    stored in the src directory).
  - Group-by search. ie you should be able to pass a field to group search
    results by
  - Auto-loading of documents during search. ie actual documents get returned
    instead of document numbers.

* Ruby bindings
  - argument checking for every method. We need a new api for argument checking
    so that the arguments get checked at the start of each method that could
    cause a segfault.
  - improve memory management. It was way to complex at the moment. I also need
    to document how it works so that other developers understand what is going
    on.
  - Replace Data_Wrap_Struct with ferret alternative which handles rewrapping
    of structs automatically and also knows when to release a struct by using
    refcounting.

* Ruby
  - integrate rcov
  - improve unit test coverage to 100%

* Documentation.
  - generate Ruby binding documentation with custom build template similar
    jaxdoc http://rubyforge.org/projects/jaxdoc
  - all documentation should meet DOCUMENTATION_STANDARDS
  - documentation in C code to be generated by doxygen

Someday Maybe
=============
* apply for Google Summer of Code 2009
* optimize read and write vint
  - test the following outside of ferret before implementing
  - perform a binary scan using bit-wise or to find out how many bytes need
    to be written
  - if the write/read will overflow the buffer, split it into two, refreshing
    the buffer in between
  - use Duff's device to write bytes now that we know how many we need
* add a super fast language based dictionary compression
* add portable stacktrace function. Perhaps implement as an external library.
  - See http://www.nongnu.org/libunwind/
  - See http://www.tlug.org.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
* investigate unscored searching
* user defined sorting
* Fix highlighting to work for external fields
* investigate faster string hashing method

Done
====
* add rake install task
* FIX :create parameter so that it only deletes the files owned by Ferret.
* fix compression. Currently nothing is happening if you set a field to
  :compress. I guess we'll just assume zlib is installed, as I think it has to
  be for Ruby to be installed.
* add bzlib support
* integrate gcov
* add a field cache to IndexReader
* setup email alerts for svn commits
* Ranged, unordered searching. Ie search through the index until you have the
  required number of documents and then break. This will require the ability to
  start searches from a particular doc-num.
  + See searcher_search_unordered in the C code and Searcher#scan in Ruby
* improve unit test code. I'd like to implement some way to print out a stack
  trace when a test fails so that it is easy to find the source of the error.
* catch segfaults and print stack trace so users can post helpful bug tickets.
  again, see the same links for adding stacktrace to unit tests.
* Add string Sort descripter
* fix memory bug
* add MultiReader interface
* add lexicographical sort (byte sort)
* Add highlighting
* add field compression
* Fix highlighting to work for compressed fields
* Add Ferret::Index::Index
* Fix:
  + Working Query:  field1:value1 AND NOT field2:value2
  + Failing Query:    field1:value1 AND ( NOT field2:value2 )
* update benchmark suite to use getrusage