fix

Menooker · Dec 11, 2024 · 06cda6f · 06cda6f
1 parent 82a6786
commit 06cda6f
Show file tree

Hide file tree

Showing 3 changed files with 97 additions and 157 deletions.
diff --git a/CAPI.md b/CAPI.md
@@ -4,7 +4,7 @@ You can call generated factor code in your C code via our C-style APIs. For othe
 
 ## Build Necessary dependencies
 
-You need to build the target `KunRuntime` for the core runtime library of KunQuant. And you may need to build your factor library like `Alpah101`. In your C-language program (or whatever else language), you need to link to `libKunRuntime.so` (in Linux. Other OS may have different names like `KunRuntime.dll` or `libKunRuntime.dylib`). You don't need to directly link to the factor library (like `libAlpha101.so`).
+You need to build the target `KunRuntime` for the core runtime library of KunQuant (or you can find it in `KunQuant/runner/` in the KunQuant install directory). And you may need to build your factor as a shared library. See `Save the compilation result as a shared library` in [Customize.md](./Customize.md). In your C-language program (or whatever else language), you need to link to `libKunRuntime.so` (in Linux. Other OS may have different names like `KunRuntime.dll` or `libKunRuntime.dylib`). You don't need to directly link to the factor library (like `libAlpha101.so`).
 
 ## C language example
 

diff --git a/Customize.md b/Customize.md
@@ -2,31 +2,18 @@
 
 This document describes how you can build your own factors.
 
-## Generating C++ source code for financial expressions
+## Generating C++ source code and shared library for financial expressions
 
 You can invoke KunQuant as a Python library to generate high performance C++ source code for your own factors. KunQuant also provides predefined factors of Alpha101, at the Python module KunQuant.predefined.Alpha101.
 
-First, you need to make sure the parent directory path is already in PYTHONPATH to let Python correctly find KunQuant package.
-
-on linux
-
-```bash
-export PYTHONPATH=$PYTHONPATH:/PATH/TO/KunQuant/
-```
-
-on windows powershell
-
-```powershell
-$env:PYTHONPATH+=";x:\PATH\TO\KunQuant\"
-```
+First, you need to install KunQuant. See [Readme.md](./Readme.md).
 
 Then in Python code, import the needed classes and functions.
 
 ```python
 from KunQuant.Op import *
 from KunQuant.Stage import *
 from KunQuant.ops import *
-from KunQuant.Driver import compileit
 ```
 
 An expression in KunQuant is composed of operators `ops`. An Op means an operation on the data, or a source of the data. Ops can fall into some typical categories, like
@@ -75,79 +62,110 @@ f = Function(builder.ops)
 
 A function can be viewed as a collection of Ops. A single function may contain several factors.
 
-Then generate the C++ source with “compileit” function!
+Then generate the C++ source and build the library with “compileit” function!
 
 ```python
-src = compileit(f, "my_library_name", output_layout="TS", options={"opt_reduce": True, "fast_log": True})
-print(src) # c++ source code will be printed
+from KunQuant.jit import cfake
+from KunQuant.Driver import KunCompilerConfig
+lib = cfake.compileit([("my_library_name", f, KunCompilerConfig(input_layout="TS", output_layout="TS"))], "my_library_name", cfake.CppCompilerConfig())
+modu = lib.getModule("my_library_name")
 ```
 
-You can see the C++ source code as a Python string. You may want to write it to a file to let the C++ compiler to turn it into executable code. We will next discuss how we can add a Factor library to `cmake` system and let it help you to compile your factors (like above) to executable binary code.
+The `lib` variable has type `KunQuant.runner.KunRunner.Library`. It is a container of multiple `modules` (in the above example, only one module is in the library). The variable `modu` has type `KunQuant.runner.KunRunner.Module`. It is the entry-point of a factor library.
 
-## Adding and building a factor library
+Note that `"my_library_name"` corresponds to `my_library_name` in the line `cfake.compileit(...)` in our Python script.
 
-Let's continue from the above example of a factor library of three factors: average close, stddev of close and alpha001. We have already compiled it into C++ source code string `src`. Now we write the string into a file. The path of the file is provided by the arguments of the Python script. The full script will be
+More reading on operators provided by KunQuant: See [Operators.md](./Operators.md)
 
-```python
-import sys
-import os
-from KunQuant.Op import *
-from KunQuant.Stage import *
-from KunQuant.ops import *
-from KunQuant.Driver import compileit
-from KunQuant.predefined.Alpha101 import alpha001, Alldata
+## Save the compilation result as a shared library
 
-builder = Builder()
-with builder:
-    inp1 = Input("close")
-    v1 = WindowedAvg(inp1, 10)
-    v2 = WindowedStddev(inp1, 10)
-    out1 = Output(v1, "avg_close")
-    out2 = Output(v2, "std_close")
-    all_data = AllData(low=Input("low"),high=Input("high"),close=inp1,open=Input("open"), amount=Input("amount"), volume=Input("volume"))
-    Output(alpha001(all_data), "alpha001")
-f = Function(builder.ops)
-src = compileit(f, "my_library_name", output_layout="TS", options={"opt_reduce": True, "fast_log": True})
-with open(sys.argv[1]+"/MyFactors.cpp", 'w') as f:
-    f.write(src)
+Like the example above, and by default, the compiled factor library is stored in a temp dir and will be automatically cleaned up. You can choose to keep the compilation result files (C++ source code, object files and the shared library), if
+ * your factors does not change and you want to save the compilation time by caching the factor library
+ * or, you want to use the compilation result in another machine/ programming language (like C/Go/Rust)
+
+In the above alpha101 example, you can run 
+
+```python
+cfake.compileit([("my_library_name", f, KunCompilerConfig(input_layout="TS", output_layout="TS"))], "your_lib_name", cfake.CppCompilerConfig(), tempdir="/path/to/a/dir", keep_files=keep, load=False)
 ```
 
-Create an directory `MyLib` at `projects/` of `KunQuant` directory. Save the above Python script at `projects/MyLib/generate.py`. 
+This will create a directory `/path/to/a/dir/your_lib_name`, and the generated C++ file will be at `your_lib_name.cpp` and the shared library file will be at `your_lib_name.{so,dll}` in the directory.
 
-Create a text file at `projects/MyLib/list.txt`. The file should list the `.cpp` files to be generated by `generate.py`. In our example, only one file will be generated. So in `list.txt` there should only be one line:
+In another process, you can load the library and get the module via
 
-```
-MyFactors.cpp
+```python
+from KunQuant.runner import KunRunner as kr
+lib = kr.Library.load("/path/to/a/dir/your_lib_name/your_lib_name.so")
+modu = lib.getModule("my_library_name")
 ```
 
-Now let cmake re-scan the project files and register our factor library. Change the current directory to the cmake build directory and run:
+And use the `modu` object just like in the example in [Readme](./Readme.md).
 
-```bash
-cd /PATH/TO/build/
-cmake ..
+## Compiler options
+
+The key function of KunQuant is `cfake.compileit`. Its signature is
+
+```python
+def compileit(func: List[Tuple[str, Function, KunCompilerConfig]], libname: str, compiler_config: CppCompilerConfig, tempdir: str | None = None, keep_files: bool = False, load: bool = True) -> KunQuant.runner.KunRunner.Library | str
 ```
 
-Compile the factor library:
+This function compiles a list of tuples `(module_name, function, config)`. By default, KunQuant will use multi-threading to compile this list of modules in parallel. The compiled modules (in C++ object files) will be linked into a shared library named by `libname`. If parameter `load` is true, the function returns the loaded library of the compilation result. Otherwise, it returns the path of the library.
 
-```bash
-cmake --build . --target MyLib
+Each module has a `KunCompilerConfig` of configurations like `layout`, `datatype`, SIMD length (will discuss below):
+
+```python
+@dataclass
+class KunCompilerConfig:
+    partition_factor : int = 3
+    dtype:str = "float"
+    blocking_len: int = None
+    input_layout:str = "STs"
+    output_layout:str = "STs"
+    allow_unaligned: Union[bool, None] = None
+    options: dict = field(default_factory=dict)
 ```
 
-There should be `libMyLib.so` or `MyLib.dll` in `projects/` directory in **build** directory of cmake (in our example, `KunQuant/build/`).
+The `CppCompilerConfig` controls how KunQuant calls the C++ compiler. To choose the non-default compiler, you can pass `CppCompilerConfig(compiler="/path/to/your/C++/compiler")` to `cfake.compileit`. You can also enable/disable AVX512 by this config class.
+
+## Specifing Memory layouts and data types and enabling AVX512
+
+### Enabling AVX512 and choosing blocking_len
+
+This project by default turns off AVX512, since this intruction set is not yet well adopted. If you are sure your CPU has AVX512, you can turn it on by passing `machine = cfake.X64CPUFlags(avx512=True)` when creating `cfake.CppCompilerConfig(machine=...)`. This will enable AVX512 features when compiling the KunQuant generated code. Some speed-up over `AVX2` mode are expected.
+
+In your customized project, you need to specify `blocking_len` parameter of in `KunCompilerConfig` to enable AVX512. Please note that `blocking_len` will affect the `STs` format (see below section). For example, if your datatype is `float`, the `blocking_len` should be 16 to enable AVX512.
 
-You can load the absolute path of the library via `KunRunner` like we did in [Readme](./Readme.md):
+There are some other CPU instruction sets that is optional for KunQuant. You can turn on `AVX512DQ` and `AVX512VL` to accelerate some parts of KunQuant-generated code. To enable them, add `avx512dq=True`, `avx512vl=True` in `cfake.X64CPUFlags(...)` respectively.
+
+To see if your CPU supports AVX512 (and `AVX512DQ` and `AVX512VL`), you can run command `lscpu` in Linux and check the outputs.
+
+Enabling AVX512 will slightly improve the performance, if it is supported by the CPU. Experiments only shows ~1% performance gain for 16-threads of AVX512 on Icelake, testing on double-precision Alpha101, with 128 stocks and time length of 12000. A single thread running the same task shows 5% performance gain on AVX512.
+
+### Memory layouts
+
+The developers can choose the memory layout when compiling KunQuant factor libraries. The memory layout decribes how the input/output matrix is organized. Currently, KunQuant supports `TS`, `STs` and `STREAM` as the memory layout. In `TS` layout, the input and output data is in plain `[num_time, num_stocks]` 2D matrix. In `STs` with `blocking_len = 8`, the data should be transformed to `[num_stocks//8, num_time, 8]` for better performance. The `STREAM` layout is for the streaming mode. You can choose the input/output layout independently in `KunCompilerConfig`, by the parameters `KunCompilerConfig(..., input_layout="TS", output_layout="STs")` for example. By default, the input layout is `STs` and the output layout is `TS`.
+
+For the alpha101 example above, to use `STs` for input, replace the compilation code with
 
 ```python
-import KunRunner as kr
-lib = kr.Library.load("./projects/libMyLib.so")
-modu = lib.getModule("my_library_name")
+lib = cfake.compileit([("alpha101", f, KunCompilerConfig(input_layout="STs", output_layout="TS"))], "out_first_lib", cfake.CppCompilerConfig())
 ```
 
-Note that `MyLib` corresponds to the directory name in `projects/`, and `"my_library_name"` corresponds to `src = compileit(f, "my_library_name", ...)` in our Python script.
+And you need to transpose the numpy array to shape `[features, stocks//8, time, 8]`, we split the axis of stocks into two axis `[stocks//8, 8]`. This step makes the memory layout of the numpy array match the SIMD length of AVX2, so that KunQuant can process the data in parallel in a single SIMD instruction. Notes:
+ * the number `8` here is the `blocking_num` of the compiled code. It is decided by the SIMD lanes of the data type and the instruction set (AVX2 or AVX512). By default, the example code of `Alpha101` generates `float` dtype with AVX2. The register size of AVX2 is 256 bits, so the SIMD lanes of `float` should be 8.
 
-You can check the script in `projects/` for more examples using KunQuant to convert expressions to C++ source code file.
+```python
+# [features, stocks, time] => [features, stocks//8, 8, time] => [features, stocks//8, time, 8]
+transposed = collected.reshape((collected.shape[0], -1, 8, collected.shape[2])).transpose((0, 1, 3, 2))
+transposed = np.ascontiguousarray(transposed)
+```
+
+### Specifing data types
+
+KunQuant supports `float` and `double` data types. It can be selected by the `dtype` parameter of `KunCompilerConfig(...)`.
+
+If AVX512 `ON` (by default is `OFF`), the `blocking_len` for `dtype='float'` can be 8 or 16, and for `dtype='double'` can be 4 or 8. If `AVX512` is `OFF`, the `blocking_len` for `dtype='float'` should only be 8, and for `dtype='double'` should be 4.
 
-More reading on operators provided by KunQuant: See [Operators.md](./Operators.md)
 
 ## Performance tuning