DOC: Adding Cylon under ecosystem/out of core (pandas-dev#41402)

ricoyudog · May 12, 2021 · ea05559 · ea05559
1 parent 76792f1
commit ea05559
Showing 1 changed file with 29 additions and 0 deletions.
diff --git a/doc/source/ecosystem.rst b/doc/source/ecosystem.rst
@@ -405,6 +405,35 @@ Blaze provides a standard API for doing computations with various
 in-memory and on-disk backends: NumPy, pandas, SQLAlchemy, MongoDB, PyTables,
 PySpark.
 
+`Cylon <https://cylondata.org/>`__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
+like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
+Arrow format to represent the data in-memory. Cylon DataFrame API implements
+most of the core operators of pandas such as merge, filter, join, concat,
+group-by, drop_duplicates, etc. These operators are designed to work across
+thousands of cores to scale applications. It can interoperate with pandas
+DataFrame by reading data from pandas or converting data to pandas so users
+can selectively scale parts of their pandas DataFrame applications.
+
+.. code:: python
+
+    from pycylon import read_csv, DataFrame, CylonEnv
+    from pycylon.net import MPIConfig
+
+    # Initialize Cylon distributed environment
+    config: MPIConfig = MPIConfig()
+    env: CylonEnv = CylonEnv(config=config, distributed=True)
+
+    df1: DataFrame = read_csv('/tmp/csv1.csv')
+    df2: DataFrame = read_csv('/tmp/csv2.csv')
+
+    # Using 1000s of cores across the cluster to compute the join
+    df3: Table = df1.join(other=df2, on=[0], algorithm="hash", env=env)
+
+    print(df3)
+
 `Dask <https://dask.readthedocs.io/en/latest/>`__
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~