GitHub - Dirpyth/wallaroo: Distributed Stream Processing

Build and scale real-time applications as easily as writing a script

A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been simpler.

What is Wallaroo?

When we set out to build Wallaroo, we had several high-level goals in mind:

Create a dependable and resilient distributed computing framework
Take care of the complexities of distributed computing "plumbing," allowing developers to focus on their business logic
Provide high-performance & low-latency data processing
Be portable and deploy easily (i.e., run on-prem or any cloud)
Manage in-memory state for the application
Allow applications to scale as needed, even when they are live and up-and-running

You can learn more about Wallaroo from our "Hello Wallaroo!" blog post and the Wallaroo overview video.

What makes Wallaroo unique

Wallaroo is a little different than most stream processing tools. While most require the JVM, Wallaroo can be deployed as a separate binary. This means no more jar files. Wallaroo also isn't locked to just using Kafka as a source, use any source you like. Application logic can be written in Python 2, Python 3, or Pony.

Getting Started

Wallaroo can either be installed via Docker, Vagrant or (on Linux) via our handy Wallaroo Up command.

As easy as:

docker pull wallaroo-labs-docker-wallaroolabs.bintray.io/release/wallaroo:latest

Check out our installation options page to learn more.

Usage

Once you've installed Wallaroo, Take a look at some of our examples. A great place to start are our word_count or market spread examples in Python.

"""
This is a complete example application that receives lines of text and counts each word.
"""
import string
import struct
import wallaroo

def application_setup(args):
    in_name, in_host, in_port = wallaroo.tcp_parse_input_addrs(args)[0]
    out_host, out_port = wallaroo.tcp_parse_output_addrs(args)[0]

    lines = wallaroo.source("Split and Count",
                        wallaroo.TCPSourceConfig(in_name, in_host, in_port,
                            decode_line))
    pipeline = (lines
        .to(split)
        .key_by(extract_word)
        .to(count_word)
        .to_sink(wallaroo.TCPSinkConfig(out_host, out_port, 
            encode_word_count)))

    return wallaroo.build_application("Word Count Application", pipeline)

@wallaroo.computation_multi(name="split into words")
def split(data):
    punctuation = " !\"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~"

    words = []

    for line in data.split("\n"):
        clean_line = line.lower().strip(punctuation)
        for word in clean_line.split(" "):
            clean_word = word.strip(punctuation)
            words.append(clean_word)

    return words

class WordTotal(object):
    count = 0

@wallaroo.state_computation(name="count word", state=WordTotal)
def count_word(word, word_total):
    word_total.count = word_total.count + 1
    return WordCount(word, word_total.count)

class WordCount(object):
    def __init__(self, word, count):
        self.word = word
        self.count = count

@wallaroo.key_extractor
def extract_word(word):
    return word

@wallaroo.decoder(header_length=4, length_fmt=">I")
def decode_line(bs):
    return bs.decode("utf-8")

@wallaroo.encoder
def encode_word_count(word_count):
    output = word_count.word + " => " + str(word_count.count) + "\n"
    return output.encode("utf-8")

Documentation

Are you the sort who just wants to get going? Dive right into our documentation then! It will get you up and running with Wallaroo.

More information is also on our blog. There you can find more insight into what we are working on and industry use-cases.

Wallaroo currently exists as a mono-repo. All the source that is Wallaroo is located in this repo. See application structure for more information.

Need Help?

Trying to figure out how to get started?

Check out the FAQ
Drop us a line:

Contributing

We welcome contributions. Please see our Contribution Guide

For your pull request to be accepted you will need to accept our Contributor License Agreement

License

Wallaroo is licensed under the Apache version 2 license.

Name		Name	Last commit message	Last commit date
Latest commit History 5,316 Commits
.ci-dockerfiles/ci-standard		.ci-dockerfiles/ci-standard
.circleci		.circleci
.for_maintainers		.for_maintainers
.github		.github
.release		.release
book/getting-started		book/getting-started
connectors		connectors
demos		demos
docker		docker
docs/proposals		docs/proposals
documentation		documentation
examples		examples
giles		giles
lib		lib
machida		machida
machida3		machida3
misc		misc
monitoring_hub		monitoring_hub
orchestration		orchestration
testing		testing
travis		travis
utils		utils
vagrant		vagrant
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS		CONTRIBUTORS
Dockerfile		Dockerfile
LICENSE		LICENSE
LIMITATIONS.md		LIMITATIONS.md
MONOREPO.md		MONOREPO.md
Makefile		Makefile
README.md		README.md
ROADMAP.md		ROADMAP.md
SUPPORT.md		SUPPORT.md
VERSION		VERSION
rules.mk		rules.mk
wallaroo-logo.png		wallaroo-logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Build and scale real-time applications as easily as writing a script

What is Wallaroo?

What makes Wallaroo unique

Getting Started

Usage

Documentation

Need Help?

Contributing

License

About

Releases

Packages

Languages

License

Dirpyth/wallaroo

Folders and files

Latest commit

History

Repository files navigation

Build and scale real-time applications as easily as writing a script

What is Wallaroo?

What makes Wallaroo unique

Getting Started

Usage

Documentation

Need Help?

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages