Data Overflow Mock Problem

Data overflow contest mock problem.

Location Aggregation

We have a TSV(Tab Separated value) file containing user_id and location_id in each line, the goal of this task is to aggregate the user visitation into a output TSV file containing user_id and the location_ids in a single line without any duplicates

Note : user_id and location_id are integers, user_id represents a user and location_id represents a location.

Input File(s)

USER_ID LOCATION_ID
1234    1
1234    2
1245    6
1293    7
1234    4
1245    5
1293    4
2345    1
1234    1

Output File

1234    1,2,4
1245    6,5
1293    7,4
2345    1

How will your code be tested?

The code will be tested against test cases.

For performance we are testing the code with a file having 1million records, 10 million records and 100 million records

Hardware Requirement:

1GB RAM, 2 core CPU

How to get started with the repository?

Login to github and visit the repository.
Fork the repository by clicking the fork button.
Clone the forked respository to the local machine.
Start writing your code by updating the location_aggregation function in the code/script.py feel free add/modify the code.
If your code is using additional libraries please mention it in the requirements.txt.
Run the basic test cases by running.

python3 wrapper.py test

This tests your code with basic test cases.
To run your code with the given sample input file, please run

python3 wrapper.py run -i {input_file_1} {input_file_2} -o output_file.tsv

Once you are happy with the code, commit the code
Submit your github repository link along with the commit id in our website.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
code		code
images		images
tests		tests
README.md		README.md
requirements.txt		requirements.txt
wrapper.py		wrapper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Overflow Mock Problem

Location Aggregation

Input File(s)

Output File

How will your code be tested?

Hardware Requirement:

How to get started with the repository?

About

Releases

Packages

Languages

vvijayan1/dataoverflow-mockproblem

Folders and files

Latest commit

History

Repository files navigation

Data Overflow Mock Problem

Location Aggregation

Input File(s)

Output File

How will your code be tested?

Hardware Requirement:

How to get started with the repository?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages