Skip to content

Commit

Permalink
fix
Browse files Browse the repository at this point in the history
  • Loading branch information
Menooker committed Dec 11, 2024
1 parent 82a6786 commit 06cda6f
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 157 deletions.
2 changes: 1 addition & 1 deletion CAPI.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ You can call generated factor code in your C code via our C-style APIs. For othe

## Build Necessary dependencies

You need to build the target `KunRuntime` for the core runtime library of KunQuant. And you may need to build your factor library like `Alpah101`. In your C-language program (or whatever else language), you need to link to `libKunRuntime.so` (in Linux. Other OS may have different names like `KunRuntime.dll` or `libKunRuntime.dylib`). You don't need to directly link to the factor library (like `libAlpha101.so`).
You need to build the target `KunRuntime` for the core runtime library of KunQuant (or you can find it in `KunQuant/runner/` in the KunQuant install directory). And you may need to build your factor as a shared library. See `Save the compilation result as a shared library` in [Customize.md](./Customize.md). In your C-language program (or whatever else language), you need to link to `libKunRuntime.so` (in Linux. Other OS may have different names like `KunRuntime.dll` or `libKunRuntime.dylib`). You don't need to directly link to the factor library (like `libAlpha101.so`).

## C language example

Expand Down
140 changes: 79 additions & 61 deletions Customize.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,18 @@

This document describes how you can build your own factors.

## Generating C++ source code for financial expressions
## Generating C++ source code and shared library for financial expressions

You can invoke KunQuant as a Python library to generate high performance C++ source code for your own factors. KunQuant also provides predefined factors of Alpha101, at the Python module KunQuant.predefined.Alpha101.

First, you need to make sure the parent directory path is already in PYTHONPATH to let Python correctly find KunQuant package.

on linux

```bash
export PYTHONPATH=$PYTHONPATH:/PATH/TO/KunQuant/
```

on windows powershell

```powershell
$env:PYTHONPATH+=";x:\PATH\TO\KunQuant\"
```
First, you need to install KunQuant. See [Readme.md](./Readme.md).

Then in Python code, import the needed classes and functions.

```python
from KunQuant.Op import *
from KunQuant.Stage import *
from KunQuant.ops import *
from KunQuant.Driver import compileit
```

An expression in KunQuant is composed of operators `ops`. An Op means an operation on the data, or a source of the data. Ops can fall into some typical categories, like
Expand Down Expand Up @@ -75,79 +62,110 @@ f = Function(builder.ops)

A function can be viewed as a collection of Ops. A single function may contain several factors.

Then generate the C++ source with “compileit” function!
Then generate the C++ source and build the library with “compileit” function!

```python
src = compileit(f, "my_library_name", output_layout="TS", options={"opt_reduce": True, "fast_log": True})
print(src) # c++ source code will be printed
from KunQuant.jit import cfake
from KunQuant.Driver import KunCompilerConfig
lib = cfake.compileit([("my_library_name", f, KunCompilerConfig(input_layout="TS", output_layout="TS"))], "my_library_name", cfake.CppCompilerConfig())
modu = lib.getModule("my_library_name")
```

You can see the C++ source code as a Python string. You may want to write it to a file to let the C++ compiler to turn it into executable code. We will next discuss how we can add a Factor library to `cmake` system and let it help you to compile your factors (like above) to executable binary code.
The `lib` variable has type `KunQuant.runner.KunRunner.Library`. It is a container of multiple `modules` (in the above example, only one module is in the library). The variable `modu` has type `KunQuant.runner.KunRunner.Module`. It is the entry-point of a factor library.

## Adding and building a factor library
Note that `"my_library_name"` corresponds to `my_library_name` in the line `cfake.compileit(...)` in our Python script.

Let's continue from the above example of a factor library of three factors: average close, stddev of close and alpha001. We have already compiled it into C++ source code string `src`. Now we write the string into a file. The path of the file is provided by the arguments of the Python script. The full script will be
More reading on operators provided by KunQuant: See [Operators.md](./Operators.md)

```python
import sys
import os
from KunQuant.Op import *
from KunQuant.Stage import *
from KunQuant.ops import *
from KunQuant.Driver import compileit
from KunQuant.predefined.Alpha101 import alpha001, Alldata
## Save the compilation result as a shared library

builder = Builder()
with builder:
inp1 = Input("close")
v1 = WindowedAvg(inp1, 10)
v2 = WindowedStddev(inp1, 10)
out1 = Output(v1, "avg_close")
out2 = Output(v2, "std_close")
all_data = AllData(low=Input("low"),high=Input("high"),close=inp1,open=Input("open"), amount=Input("amount"), volume=Input("volume"))
Output(alpha001(all_data), "alpha001")
f = Function(builder.ops)
src = compileit(f, "my_library_name", output_layout="TS", options={"opt_reduce": True, "fast_log": True})
with open(sys.argv[1]+"/MyFactors.cpp", 'w') as f:
f.write(src)
Like the example above, and by default, the compiled factor library is stored in a temp dir and will be automatically cleaned up. You can choose to keep the compilation result files (C++ source code, object files and the shared library), if
* your factors does not change and you want to save the compilation time by caching the factor library
* or, you want to use the compilation result in another machine/ programming language (like C/Go/Rust)

In the above alpha101 example, you can run

```python
cfake.compileit([("my_library_name", f, KunCompilerConfig(input_layout="TS", output_layout="TS"))], "your_lib_name", cfake.CppCompilerConfig(), tempdir="/path/to/a/dir", keep_files=keep, load=False)
```

Create an directory `MyLib` at `projects/` of `KunQuant` directory. Save the above Python script at `projects/MyLib/generate.py`.
This will create a directory `/path/to/a/dir/your_lib_name`, and the generated C++ file will be at `your_lib_name.cpp` and the shared library file will be at `your_lib_name.{so,dll}` in the directory.

Create a text file at `projects/MyLib/list.txt`. The file should list the `.cpp` files to be generated by `generate.py`. In our example, only one file will be generated. So in `list.txt` there should only be one line:
In another process, you can load the library and get the module via

```
MyFactors.cpp
```python
from KunQuant.runner import KunRunner as kr
lib = kr.Library.load("/path/to/a/dir/your_lib_name/your_lib_name.so")
modu = lib.getModule("my_library_name")
```

Now let cmake re-scan the project files and register our factor library. Change the current directory to the cmake build directory and run:
And use the `modu` object just like in the example in [Readme](./Readme.md).

```bash
cd /PATH/TO/build/
cmake ..
## Compiler options

The key function of KunQuant is `cfake.compileit`. Its signature is

```python
def compileit(func: List[Tuple[str, Function, KunCompilerConfig]], libname: str, compiler_config: CppCompilerConfig, tempdir: str | None = None, keep_files: bool = False, load: bool = True) -> KunQuant.runner.KunRunner.Library | str
```

Compile the factor library:
This function compiles a list of tuples `(module_name, function, config)`. By default, KunQuant will use multi-threading to compile this list of modules in parallel. The compiled modules (in C++ object files) will be linked into a shared library named by `libname`. If parameter `load` is true, the function returns the loaded library of the compilation result. Otherwise, it returns the path of the library.

```bash
cmake --build . --target MyLib
Each module has a `KunCompilerConfig` of configurations like `layout`, `datatype`, SIMD length (will discuss below):

```python
@dataclass
class KunCompilerConfig:
partition_factor : int = 3
dtype:str = "float"
blocking_len: int = None
input_layout:str = "STs"
output_layout:str = "STs"
allow_unaligned: Union[bool, None] = None
options: dict = field(default_factory=dict)
```

There should be `libMyLib.so` or `MyLib.dll` in `projects/` directory in **build** directory of cmake (in our example, `KunQuant/build/`).
The `CppCompilerConfig` controls how KunQuant calls the C++ compiler. To choose the non-default compiler, you can pass `CppCompilerConfig(compiler="/path/to/your/C++/compiler")` to `cfake.compileit`. You can also enable/disable AVX512 by this config class.

## Specifing Memory layouts and data types and enabling AVX512

### Enabling AVX512 and choosing blocking_len

This project by default turns off AVX512, since this intruction set is not yet well adopted. If you are sure your CPU has AVX512, you can turn it on by passing `machine = cfake.X64CPUFlags(avx512=True)` when creating `cfake.CppCompilerConfig(machine=...)`. This will enable AVX512 features when compiling the KunQuant generated code. Some speed-up over `AVX2` mode are expected.

In your customized project, you need to specify `blocking_len` parameter of in `KunCompilerConfig` to enable AVX512. Please note that `blocking_len` will affect the `STs` format (see below section). For example, if your datatype is `float`, the `blocking_len` should be 16 to enable AVX512.

You can load the absolute path of the library via `KunRunner` like we did in [Readme](./Readme.md):
There are some other CPU instruction sets that is optional for KunQuant. You can turn on `AVX512DQ` and `AVX512VL` to accelerate some parts of KunQuant-generated code. To enable them, add `avx512dq=True`, `avx512vl=True` in `cfake.X64CPUFlags(...)` respectively.

To see if your CPU supports AVX512 (and `AVX512DQ` and `AVX512VL`), you can run command `lscpu` in Linux and check the outputs.

Enabling AVX512 will slightly improve the performance, if it is supported by the CPU. Experiments only shows ~1% performance gain for 16-threads of AVX512 on Icelake, testing on double-precision Alpha101, with 128 stocks and time length of 12000. A single thread running the same task shows 5% performance gain on AVX512.

### Memory layouts

The developers can choose the memory layout when compiling KunQuant factor libraries. The memory layout decribes how the input/output matrix is organized. Currently, KunQuant supports `TS`, `STs` and `STREAM` as the memory layout. In `TS` layout, the input and output data is in plain `[num_time, num_stocks]` 2D matrix. In `STs` with `blocking_len = 8`, the data should be transformed to `[num_stocks//8, num_time, 8]` for better performance. The `STREAM` layout is for the streaming mode. You can choose the input/output layout independently in `KunCompilerConfig`, by the parameters `KunCompilerConfig(..., input_layout="TS", output_layout="STs")` for example. By default, the input layout is `STs` and the output layout is `TS`.

For the alpha101 example above, to use `STs` for input, replace the compilation code with

```python
import KunRunner as kr
lib = kr.Library.load("./projects/libMyLib.so")
modu = lib.getModule("my_library_name")
lib = cfake.compileit([("alpha101", f, KunCompilerConfig(input_layout="STs", output_layout="TS"))], "out_first_lib", cfake.CppCompilerConfig())
```

Note that `MyLib` corresponds to the directory name in `projects/`, and `"my_library_name"` corresponds to `src = compileit(f, "my_library_name", ...)` in our Python script.
And you need to transpose the numpy array to shape `[features, stocks//8, time, 8]`, we split the axis of stocks into two axis `[stocks//8, 8]`. This step makes the memory layout of the numpy array match the SIMD length of AVX2, so that KunQuant can process the data in parallel in a single SIMD instruction. Notes:
* the number `8` here is the `blocking_num` of the compiled code. It is decided by the SIMD lanes of the data type and the instruction set (AVX2 or AVX512). By default, the example code of `Alpha101` generates `float` dtype with AVX2. The register size of AVX2 is 256 bits, so the SIMD lanes of `float` should be 8.

You can check the script in `projects/` for more examples using KunQuant to convert expressions to C++ source code file.
```python
# [features, stocks, time] => [features, stocks//8, 8, time] => [features, stocks//8, time, 8]
transposed = collected.reshape((collected.shape[0], -1, 8, collected.shape[2])).transpose((0, 1, 3, 2))
transposed = np.ascontiguousarray(transposed)
```

### Specifing data types

KunQuant supports `float` and `double` data types. It can be selected by the `dtype` parameter of `KunCompilerConfig(...)`.

If AVX512 `ON` (by default is `OFF`), the `blocking_len` for `dtype='float'` can be 8 or 16, and for `dtype='double'` can be 4 or 8. If `AVX512` is `OFF`, the `blocking_len` for `dtype='float'` should only be 8, and for `dtype='double'` should be 4.

More reading on operators provided by KunQuant: See [Operators.md](./Operators.md)

## Performance tuning

Expand Down
Loading

0 comments on commit 06cda6f

Please sign in to comment.