Skip to content

Commit

Permalink
DOC: Adding Cylon under ecosystem/out of core (pandas-dev#41402)
Browse files Browse the repository at this point in the history
  • Loading branch information
chathurawidanage authored May 12, 2021
1 parent 76792f1 commit ea05559
Showing 1 changed file with 29 additions and 0 deletions.
29 changes: 29 additions & 0 deletions doc/source/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -405,6 +405,35 @@ Blaze provides a standard API for doing computations with various
in-memory and on-disk backends: NumPy, pandas, SQLAlchemy, MongoDB, PyTables,
PySpark.

`Cylon <https://cylondata.org/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cylon is a fast, scalable, distributed memory parallel runtime with a pandas
like Python DataFrame API. ”Core Cylon” is implemented with C++ using Apache
Arrow format to represent the data in-memory. Cylon DataFrame API implements
most of the core operators of pandas such as merge, filter, join, concat,
group-by, drop_duplicates, etc. These operators are designed to work across
thousands of cores to scale applications. It can interoperate with pandas
DataFrame by reading data from pandas or converting data to pandas so users
can selectively scale parts of their pandas DataFrame applications.

.. code:: python
from pycylon import read_csv, DataFrame, CylonEnv
from pycylon.net import MPIConfig
# Initialize Cylon distributed environment
config: MPIConfig = MPIConfig()
env: CylonEnv = CylonEnv(config=config, distributed=True)
df1: DataFrame = read_csv('/tmp/csv1.csv')
df2: DataFrame = read_csv('/tmp/csv2.csv')
# Using 1000s of cores across the cluster to compute the join
df3: Table = df1.join(other=df2, on=[0], algorithm="hash", env=env)
print(df3)
`Dask <https://dask.readthedocs.io/en/latest/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down

0 comments on commit ea05559

Please sign in to comment.