Skip to content

A simple framework for MapReduce processing on a single machine

License

Notifications You must be signed in to change notification settings

michelcaradec/minimapreduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minimapreduce

A simple framework for MapReduce processing on a single machine.

Usage

From command line:

cd sample
cat corpus.txt | python map.py | python shuffle.py | python reduce.py

Mapper

def mapper(stream, map_function, emit=emit_console)

Arguments

Argument Description
stream Iterable input stream.
map_function Function called to map each input stream entry.
emit Function called to output map_function result. Default emit_console outputs to console.

map_function Function

def map_function(line, emit)
Arguments
Argument Description
line Line to process.
emit Function called to output result.

emit Function

def emit(key, value=None)
Arguments
Argument Description
key Key.
value Value associated to key.

When emitted, key and value must be separated by common.COLUMN_SEPARATOR. common.format_key_value can be used to do so.

Reducer

def reducer(stream, reduce_function, emit=emit_console)

Arguments

Argument Description
stream Input stream. Must be iterable.
reduce_function Function called to reduce each input stream entry.
emit Function called to output reduce_function result.

reduce_function Function

def reduce(key, values)
Arguments
Argument Description
key Key.
value Array of values associated to key.

About

A simple framework for MapReduce processing on a single machine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages