Skip to content
forked from rapidsai/raft

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

License

Notifications You must be signed in to change notification settings

mhaseeb123/raft

Repository files navigation

 RAFT: RAPIDS Analytics Framework Toolkit

RAFT is a Scipy-like library for scientific computing, containing CUDA-accelerated building-blocks for rapidly composing analytics in the RAPIDS ecosystem. These building-blocks include infrastructure as well as mathematical computational primitives, which accelerate the development of algorithms for data science applications.

By taking a primitives-based approach to algorithm development, RAFT

  1. accelerates algorithm construction time
  2. reduces the maintenance burden by maximizing reuse across projects, and
  3. centralizes the core computations, allowing future optimizations to benefit all algorithms that use them.

RAFT provides a header-only C++ API (with optional shared libraries to accelerate build time) that cover the following general categories:

Category Description / Examples
Data Formats sparse & dense, conversions, and data generations
Data Generation sparse, spatial, machine learning datasets
Dense Linear Algebra matrix arithmetic, norms, factorization
Spatial pairwise distances, nearest neighbors, neighborhood graph construction
Sparse Operations linear algebra, slicing, symmetrization, norms, spectral embedding, msf
Basic Clustering spectral clustering, hierarchical clustering, k-means
Optimizers eigenvalue decomposition, least squares, and lanczos
Statistics sampling, moments, metrics
Distributed Tools multi-node multi-gpu infrastructure

RAFT also provides a Python API that enables the building of multi-node multi-GPU algorithms in the Dask ecosystem. We are continuing to improve the coverage of the Python API to expose the building-blocks from the categories above.

Getting started

Rapids Memory Manager (RMM)

RAFT relies heavily on RMM which, like other projects in the RAPIDS ecosystem, eases the burden of configuring different allocation strategies globally across the libraries that use it. RMM also provides RAII) wrappers around device arrays that handle the allocation and cleanup.

C++ Example

Most of the primitives in RAFT accept a raft::handle_t object for the management of resources which are expensive to create, such CUDA streams, stream pools, and handles to other CUDA libraries like cublas and cusolver.

The example below demonstrates creating a RAFT handle and using it with RMM's device_uvector to allocate memory on device and compute pairwise Euclidean distances:

#include <raft/handle.hpp>
#include <raft/distance/distance.hpp>

#include <rmm/device_uvector.hpp>
raft::handle_t handle;

int n_samples = ...;
int n_features = ...;

rmm::device_uvector<float> input(n_samples * n_features, handle.get_stream());
rmm::device_uvector<float> output(n_samples * n_samples, handle.get_stream());

// ... Populate feature matrix ...

auto metric = raft::distance::DistanceType::L2SqrtExpanded;
rmm::device_uvector<char> workspace(0, handle.get_stream());
raft::distance::pairwise_distance(handle, input.data(), input.data(),
                                  output.data(),
                                  n_samples, n_samples, n_features,
                                  workspace.data(), metric);

Build/Install RAFT

Refer to the Build instructions for details on building and including the RAFT library in downstream projects.

Folder Structure and Contents

The folder structure mirrors other RAPIDS repos (cuDF, cuML, cuGraph...), with the following folders:

  • ci: Scripts for running CI in PRs
  • conda: Conda recipes and development conda environments
  • cpp: Source code for all C++ code.
    • include: The C++ API is fully-contained here
    • src: Compiled template specializations for the shared libraries
  • docs: Source code and scripts for building library documentation
  • python: Source code for all Python source code.

Contributing

If you are interested in contributing to the RAFT project, please read our Contributing guidelines. Refer to the Developer Guide for details on the developer guidelines, workflows, and principals.

About

RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 63.3%
  • C++ 16.1%
  • Jupyter Notebook 12.6%
  • Python 3.5%
  • Cython 2.9%
  • CMake 0.9%
  • Other 0.7%