Skip to content

Latest commit

 

History

History
 
 

jenetics.prog

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Module: io.jenetics.prog

Javadoc

Genetic programming (GP) is a technique where a set of genes are modified (evolved) using an evolutionary algorithm. The io.jenetics.prog modules contains a ProgramGene, together with its corresponding ProgramChromosome, which allows to create operation (Op) trees.

The io.jenetics.prog module contains the classes which are needed for doing Genetic programming with the Jenetics library. It introduces a new Chromosome/Gene pair, which are the main entry points.

public interface Op<T> {
    public String name();
    public int arity();
    public T apply(T[] args);
}

The generic type of the Op enforces data-type constraints for the created program tree and makes the implementation a strongly typed GP algorithm.

Operations

When creating a new program tree, it is not necessary to implement own instance of the ProgramGene or ProgramChromosome class. The extension point for own programs is the Op interface.

 final Op<Double> myop = Op.of("myop", 3, v -> v[0]*v[1] + v[2]);

In the example above, a new operation with the "myop" and arity 3 is defined. Whenever the operation is evaluated, the function f(x, y, z) = x*y + z is executed.

Note
The class MathOp in the io.jenetics.prog.op package defines a set of mathematical standard operations/functions.

When creating a new ProgramChromosome we must distinguish two different kind of operations:

  1. Non-terminal operations have an arity greater than zero and form their own sub-tree.

  2. Terminal operations have an arity of zero and form the leaves of a program tree.

There are currently three different types of terminal operations implemented, Var, Const and the EphemeralConst class.

Var

The Var operation defines a variable of a program, which are set from the program arguments.

final ISeq<Op<Double>> terminals = ISeq.of(
    Var.of("x", 0), Var.of("y", 1), Var.of("z", 2)
);

The terminal operation list in the example code above will lead to a program which takes three input parameters, x, y and z, with the argument indices 0, 1 and 2.

Const

The Const operation will always return the same, constant, value when evaluated.

final Op<Double> one = Const.of(1.0);
final Op<Double> pi = Const.of("π", Math.PI);

We can create a constant operation in to flavours, with a value only and with a dedicated name. If a constant has a name, the symbolic name is used, instead of the value, when the program tree is printing.

EphemeralConst

An ephemeral constant is a special constant, which is only constant within an tree. If a new tree is created, a new constant is created, by the Supplier function the ephemeral constant is created with.

final Op<Double> rand1 = EphemeralConst.of(Math::random);
final Op<Double> rand2 = EphemeralConst.of("R", Math::random);

Program trees

Program creation

The ProgramChromosome comes with some factory methods, which lets you easily create program trees with a given depth and a given set of operations and terminals.

final ISeq<Op<Double>> operations = ISeq.of(...);
final ISeq<Op<Double>> terminals = ISeq.of(...);
final ProgramChromosome<Double> program =
    ProgramChromosome.of(depth, operations, terminals);

The code snippet above will create a perfect program tree of depth 5. All non-leaf nodes will contain operations, randomly selected from the operations set. Whereas all leaf nodes are filled with operations from the terminals set.

Note
The created program tree is perfect, which means that all leaf nodes have the same depth. If new trees needs to be created during evolution, they will be created with the depth, operations and terminals defined by the template program tree.

The GP Engine is created in the exact same way as the GA Engine.

final Engine<ProgramGene<Double>, Double> engine = Engine
    .builder(Main::error, program)
    .minimizing()
    .alterers(
        new SingleNodeCrossover<>(),
        new Mutator<>())
    .build();

For a detailed description on the correct Engine setup have a look at the Example section.

Program repair

The specialized Alterer classes for the ProgramChromosome guarantees that the program tree after the alter operation is still valid. They obey the tree structure of the chromosome, although they are altering the flattened ProgramGene sequence. General alterers, not written for ProgramChromosome, will most likely destroy the tree property of the altered chromosome. There are essentially two possibility for handling invalid tree chromosomes: 1) marking the chromosome as invalid or 2) try to repair the invalid chromosome. Possibility 1) would be easier, but it would also lead to a possible large number of invalid chromosomes.

Note
Jenetics allows the usage of arbitrary Alterer implementations. Even alterers not implemented for ProgramChromosomes. Chromosomes destroyed by the alterer are repaired.

Examples

Symbolic regression

Symbolic regression involves finding a mathematical expression, in symbolic form, that provides a good, best, or perfect fit between a given finite sampling of values of the independent variables and the associated values of the dependent variables.–John R. Koza, Genetic Programming I

The following example shows how to solve a GP problem with Jenetics. We are trying to find a polynomial (or an arbitrary mathematical function) which approximates a given data set.

Table 1. Table Sample points

x

y

0.00

0.0000

0.10

0.0740

0.20

0.1120.

0.30

0.1380

…​

…​

The sample points has been created with the function f(x) = 4*x^3 - 3*x^2 + x. The knowledge of the creating function makes it easier to compare the quality of the evolved function. For the example we created 21 data points.

Note
The function which created the sample points is not needed in the error function we have to define for the GP. It just let us verify the final, evolved result.

As first step, we have to define the set of allowed non-terminal and the terminal operations the GP is working with. Selecting the right set of operation has a big influence on the performance of the GP. If the operation set is bigger than necessary, we will expand the potential search space, and the execution time for finding a solution. For our polynomial example we will chose the following operations and terminals.

static final ISeq<Op<Double>> OPERATIONS = ISeq.of(
    MathOp.ADD,
    MathOp.SUB,
    MathOp.MUL
);

static final ISeq<Op<Double>> TERMINALS = ISeq.of(
    Var.of("x", 0),
    EphemeralConst.of(() ->
        (double)RandomRegistry.getRandom().nextInt(10))
);

The chosen non-terminal operation set is sufficient to create any polynomial. For the terminal operations, we added a variable "x", with index zero, and an ephemeral int constant. The purpose of the ephemeral constant is to create constant values, which will differ for every tree, but stay constant within a tree.

The io.jenetics.prog.regression package contains interfaces and classes, which simplifies the definition of the error function and the implementation of symbolic regression problems.

private static final Regression<Double> REGRESSION = Regression.of(
    Regression.codecOf(OPERATIONS, TERMINALS, 5),
    Error.of(LossFunction::mse),
    Sample.ofDouble(-1.0, -8.0000),
    // ...
    Sample.ofDouble(0.9, 1.3860),
    Sample.ofDouble(1.0, 2.0000)
);

public static void main(final String[] args) {
    final Engine<ProgramGene<Double>, Double> engine = Engine
        .builder(REGRESSION)
        .minimizing()
        .alterers(
            new SingleNodeCrossover<>(0.1),
            new Mutator<>())
        .build();

    final EvolutionResult<ProgramGene<Double>, Double> result = engine
        .stream()
        .limit(Limits.byFitnessThreshold(0.01))
        .collect(EvolutionResult.toBestEvolutionResult());

    final ProgramGene<Double> program = result.getBestPhenotype()
        .getGenotype()
        .getGene();

    final TreeNode<Op<Double>> tree = program.toTreeNode();
    MathExpr.rewrite(tree); // Simplify result program.
    System.out.println("Generations: " + result.getTotalGenerations());
    System.out.println("Function:    " + new MathExpr(tree));
    System.out.println("Error:       " + REGRESSION.error(tree));
}

The error function uses the mean square error as loss function. It is also possible to define a more advanced error function, which penalizes big program trees.

final Error error = Error.of(
    LossFunction::mse,
    Complexity.ofMaxNodeCount(50)
);

The error function above consists of a loss function and a tree complexity function. The definition of the Codec is also simplified by the Regression class and several factory methods: Regression.codecOf(OPERATIONS, TERMINALS, 5)

The GP is capable of finding the polynomial which created the sample data. and we get the following (correct) output program:

Generations: 9494
Function:    ((x^2.0*4.0 - (x + 5.0)) + (((6.0 - x) - 2.0*x) + x))*x
Error:       1.3621827363801722E-31

This program can be reduced to 4*x^3 - 3*x^2 + x, which is exactly the polynomial, which created the sample data.