Wallaroo is a fast, elastic data processing engine that rapidly takes you from prototype to production by eliminating infrastructure complexity.
- What is Wallaroo?
- Status
- Getting Started
- Documentation
- Getting Help
- How to Contribute
- License
- Frequently Asked Questions
- Additional Links
- About this Repository
Wallaroo is a fast and elastic data processing engine that rapidly takes you from prototype to production.
When we set out to build Wallaroo, we had several high-level goals in mind:
- Create a dependable and resilient distributed computing framework
- Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic
- Provide high-performance & low-latency data processing
- Be portable (i.e., run on-prem or any cloud)
- Manage in-memory state for the application
- Allow applications to scale as needed, even when they are live and up-and-running
You can learn more about Wallaroo from our "Hello Wallaroo!" blog post.
We've done a 15-minute video of our engineering presentation that has helped people understand what Wallaroo is. If you watch it, you will get:
- An overview of the problem we are solving with our Scale-Independent API
- A short intro to the Python API
- A demonstration of our Autoscale functionality (for stateful applications)
- To see the power of Scale-Independent APIs in action
- State Management
- Scale running stateful applications with zero downtime
- Resilience in the face of failures
- Exactly-once message processing
Existing language bindings:
- Python 2.7
- C++
- Go
- Pony
Planned Language bindings:
- Python 3
- JavaScript
Please see status for language binding support details. Wallaroo is open source software with an expanding software community. Please see the How to Contribute section if you wish to help support your favorite data analysis language.
- Linux
- MacOS
Wallaroo is open source software with an expanding software community. Please see the How to Contribute section if you wish to help support your favorite operating system.
Wallaroo applications are user hosted. It's equally at home "in the cloud" or "on-premise."
We have no "as a service" offering at this time.
Interested in where we are taking Wallaroo? Check out our roadmap.
- Pony
Wallaroo is primarily written in Pony. As such, Pony is the first language to receive support for any given feature. We don't expect the Pony API to get much usage outside of Wallaroo Labs. We aren't maintaining any documentation for the Pony API outside of a few examples. You are welcome to use the Pony API but are going to mostly be on your own documentation wise.
- Python 2.7
Along with Go, Python 2.7 is our primary focus. As we add features to the Wallaroo, we will be adding corresponding Python APIs and documentation.
- Python 3
We are currently working with a client who needs Python 3 bindings. We plan to introduce Python 3 bindings in early 2018.
- C++
C++ is currently unsupported and apps created with the C++ API will not build unless you checkout the last-working-C++-commit
tag. If you are interested in using Wallaroo with C++, you should contact us. We're happy to work with you.
C++ was our first non-Pony API. Since that time we have learned a lot about writing Wallaroo language bindings. We plan on revisiting the C++ API in the future to improve its ergonomics. New functionality added to Wallaroo is not currently being implemented in the C++ API.
- Go
Along with Python 2.7, Go is our primary focus. As we add features to the Wallaroo, we will be adding corresponding Go APIs and documentation. The currently available version of the Go API is our first pass. We're quite interested in getting your feedback and improving it.
- JavaScript
JavaScript support is currently in the planning stages with a release in 2018.
We have numerous issues open to improve existing Wallaroo functionality. For a high-level overview, please see our current limitations document.
Are you the sort who just wants to get going? Dive right into our documentation then! It will get you up and running with Wallaroo.
Our primary documentation is hosted by GitBook at http://docs.wallaroolabs.com. You can find additional information on our community site.
We're an open source project and welcome contributions. Trying to figure out how to get started? Drop us a line on IRC or the developer mailing list, and we can get you started.
Be sure to check out our contributors guide before you get started.
Wallaroo is an open source project. All of the source code is available to you. Most of the Wallaroo code base is available under the Apache License, version 2. However, not all of the Wallaroo source code is Apache 2 licensed. Parts of Wallaroo are licensed under the Wallaroo Community License Agreement. Source files in this repository have a header indicating which license they are under. Currently, all files that are licensed under the Wallaroo Community License Agreement are in the lib/wallaroo/ent
directory.
The core stream processing engine and state management facilities are all licensed under the the Apache version 2. Autoscaling, exactly-once message processing and resiliency features are licensed under the Wallaroo Community License Agreement.
The Wallaroo Community License is based on Apache version 2. However, you should read it for yourself. Here we provide a summary of the main points of the Wallaroo Community License Agreement.
- You can run all Wallaroo code in a non-production environment without restriction.
- You can run all Wallaroo code in a production environment for free on up to 3 server or 24 cpus.
- If you want to run Wallaroo Enterprise version features in production above 3 servers or 24 cpus, you have to obtain a license.
- You can modify and redistribute any Wallaroo code
- Anyone who uses your modified or redistributed code is bound by the same license and needs to obtain a Wallaroo Enterprise license to run on more than 3 servers or 24 cpus in a production environment.
Please contact us if you have any questions about licensing or Wallaroo Enterprise.
15 minute overview of key Wallaroo features. Includes actual code and a demonstration of our stateful autoscaling functionality.
Our open source annoucement.
An introduction to Wallaroo.
A look inside Wallaroo's excellent performance
The company behind Wallaroo.
Wallaroo documentation.
Wallaroo Labs blog.
- QCon NY 2016: How did I get here? Building Confidence in a Distributed Stream Processor
- CodeMesh 2016:How did I get here? Building Confidence in a Distributed Stream Processor
Our VP of Engineering Sean T. Allen talks about one of the techniques we use to test Wallaroo.
Wallaroo currently exists as a mono-repo. All the source that makes Wallaroo go is in this repo. Let's take a quick walk through what you'll find in each top-level directory:
- book
Markdown source used to build http://docs.wallaroolabs.com. http://docs.wallaroolabs.com gets built from the latest commit to the release
branch.
- cpp_api
Code for writing Wallaroo applications using C++. This is currently unsupported.
- examples
Wallaroo example applications in a variety of languages. Currently, only the Python API examples are supported. See status section for details.
- giles
TCP utility applications that can stream data over TCP to Wallaroo applications and receive TCP streams from said applications.
- go_api
Code for writing Wallaroo applications using Go.
- lib
The Pony source code that makes up Wallaroo.
- machida
Python runner application. Machida embeds a Python interpreter inside a native Wallaroo binary and allows you to run applications using the Wallaroo Python API.
- monitoring hub
Source for the Wallaroo metrics UI.
- orchestration
Tools we use to create machines in AWS and other environments.
- testing
Tools we have written that are used to test Wallaroo.
- utils
End user utilities designed to make it easier to do various Wallaroo tasks like cleanly shut down a cluster.