PROJECTS

In this file you will find some ideas for projects for improving
LELA. They are ordered according to estimated difficulty from easist
to hardest.

1. New unit-tests

The following parts of the library do not yet have unit-tests:

 - Vector-streams (see lela/vector/stream.h)
 - M4RI wrapper (see lela/blas/level3-m4ri.h)
 - PivotStrategy (see lela/algorithms/pivot-strategy.h)
 - EchelonForm (see lela/solutions/echelon-form.h, echelon-form-gf2.h)

2. New solutions

Right now there is only one solution in lela/solutions, namely, the
calculation of the row-echelon form of a matrix. Solutions for solving
equations, computing the rank and determinant of a matrix, and
constructing the nullspace of a matrix are relatively easy to
implement with existing algorithms.

Solutions should also have unit-tests covering various cases and each
one should ideally have a command-line utility (in the directory util)
associated with it.

3. Improve unit-tests for rings

Some bugs in the ring-implementations, particularly Modular, have
managed to make it through the test in tests/test-ring.h. In
particular the following should be tested:

 * Negating an element, including 0

 * Subtracting an element close to but above modulus / 2 from one
   close to but below modulus / 2, and vice versa, over Modular<float>
   and Modular<double>

In both cases it should be checked that the result maintains a valid
representation (i.e. 0..modulus-1 for the integral Element-types,
-modulus/2..modulus/2 for the floating-point types).

4. Explicit instantiation

It would be nice to instantiate the content of the .tcc-files
throughout the library explicitly for common cases so that they need
not be included and so that compiling against LELA becomes much more
efficient. This requires, for each .tcc-file, the creation of a
.C-file with explicit instantiations of the functions and classes in
the .tcc-file. Rules to compile the .C-files must then be added to the
associated Makefile.am so that the result is included in liblela.a.

5. Implement readers for missing matrix-formats

Several matrix-formats do not have corresponding read-formats
implemented. See lela/matrix/io.tcc; the corresponding functions throw
the exception NotImplemented.

6. Asymptotically fast PLUQ-decomposition

Add to GaussJordan (see lela/algorithms/gauss-jordan.h) a recursive
algorithm which uses matrix-multiplication to compute the
PLUQ-decomposition of a matrix. Currently PLUQ exists only in
Elimination and cannot take advantage of fast matrix-multiplication. A
modest modification of GaussJordan::GaussTransform should suffice.

7. Parallelised GEMM

LELA currently doesn't take advantage of multithreading at all. The
easiest place to add it would be to implement a version of
matrix-multiplication which chops its input into pieces and parcels
the pieces out to different processors. This can be done relatively
easily by creating a corresponding module for the BLAS-system and
inserting it in a suitable place in the stack.

8. Sparse elimination with full pivoting

The current sparse pivot-strategy does not allow column-swaps. This is
because such operations are not allowed for the construction of the
row-echelon form of F4-matrices, the main application of
LELA. However, a pivot-strategy for sparse matrices which does full
pivoting may be useful for other applications. This can be done by
creating a new class with the interface PivotStrategy (see
lela/algorithms/pivot-strategy.h) and using it with the existing
method Elimination::pluq (see lela/algorithms/elimination.h).

9. Improve the implementation of Faugère-Lachartre

The current implementation of Faugère-Lachartre (see
lela/algorithms/faugere-lachartre.h) is quite basic in that it does
not attempt to divide the input-matrix according to density and does
not allow the configuration of whether the submatrices it extracts use
a dense, sparse, or hybrid representation. This suffices for the
examples which have been considered so far, but more flexibility may
be required for a proper implementation of F4 in the future.

Ideally, the algorithm should either take hints as input or somehow
scan the matrix to find blocks which are better represented as sparse
or as dense matrices, and then chop the matrix accordingly.

10. Implement the remaining variants of Strassen-Winograd

The implementation of Strassen-Winograd in LELA is based on the paper

Boyer, B., Dumas, J.-G., Pernet, C., & Zhou, W. (2007). Memory efficient
scheduling of Strassen-Winogradʼs matrix multiplication algorithm.
Proceedings of the 2009 international symposium on Symbolic and algebraic
computation ISSAC 09, 55. ACM Press. Retrieved from
http://arxiv.org/abs/0707.2347

This paper actually identifies three variants of Strassen-Winograd
which can be implemented, each with its own advantages and
disadvantages. LELA currently only implements one of the three but the
class StrassenWinograd (see lela/algorithms/strassen-winograd.h)
contains interfaces for all three. The functions need only be filled
in.

11. Implement a system for sensibly setting the cutoff for
    Strassen-Winograd

Right now the default cutoff for matrix-sizes below which classical
matrix-multiplication is used is just an educated guess (see
lela/algorithms/strassen-winograd.h). A correct value in fact depends
on the computer-hardware and needs to be set somehow at
configuration-time.

Ideally there should be a utility "autotune" which runs after the rest
of LELA has been configured and built and which determines the optimal
cutoff and sets it in some configuration- or header-file.

12. Algorithms over PIDs and PIRs

The high-level algorithms which LELA implements (see
lela/algorithms/elimination.h, lela/algorithms/gauss-jordan.h) work
only over fields. Some can be modified relatively easily to produce
fraction-free variants which work over PIRs (e.g. Z/n, n not
prime). Algorithms which are relevant over PIDs such as the integers
would also be useful, for example computing the Smith normal form of a
matrix.

13. Improve the hybrid subvector

Currently the const_iterator in an arbitrary (i.e. not word-aligned)
hybrid subvector is extremely slow (see
lela/vector/sparse-subvector-hybrid.h, .tcc). There is much room for
improvement by optimising the checks which it performs.

WARNING: The hybrid subvector code is extremely delicate! Don't touch
it unless you are certain of what you are doing.