This Data Challenge has 2 tasks;
-
to determine the type of an Iris plant given data on sepal and petal lengths and widths of the flowers
-
to identify a digit/number given optical data from handwritten digits
to the highest ranked model submission, based on the Git timestamps
to the next top 10 ranked submissions, based on Git timestamps
Where we will hack on a real business’ data and stand a chance to win even more exciting prizes and learn from experienced mentors [this is for those based in or willing to travel to Nairobi]
The Data Challenge is judged based on the following criteria:
- A Correct fork, branch and pull request
- Using the GitHub Pull Request timestamp where order of submissions is applicable
- Using solution quality/accuracy and explanation to rank submissions where applicable
- Do not share any code that you cannot open source on the Git Repository as its open source and african.ai will not be liable for any breach of intellectual property (if at all) once shared on the platform.
1.Fork the code challenge repository provided.
2.Make a topic branch. In your github form, keep the master branch clean. When you create a branch, it essentially will be a copy of the master.
Pull all changes, make sure your repository is up to date
$ cd challenge1_HowCrispCanYouClassify
$ git pull origin master
Create a new branch as follows-> git checkout -b [your_phone_number_email], e.g.
$ git checkout -b [email protected] master
See all branches created
$ git branch
* [email protected]
master
Push the new branch to github
$ git push origin -u [email protected]
3.Remember to only make changes to the fork!
The folder named data contains 4 csv files.
- iris_train
- iris_test
- digits_train
- digits_test
The folder names submissions contains 2 csv files.
- digits_sample_submission
- iris_sample_submission
The train datasets contain labelled records, ie. their classes are known. In each case:
- use the train datasets to train a satisfactory classification model
- use the model to classify the records in the test datasets
- ensure the format of your submission files is similar to the
- once satisfied with the model and the predictions, name the file containing labelled test data iris_test_labelled or digit_test_labelled and include it in the submissions folder, submission files should include only 2 columns, the id and the predicted labels
- Add to the base of the existing README file a brief explanation about your solution outlining the algorithm you chose to use, why you chose it and how the algorithm compared to any others you may have tried to use
4.Commit the changes to your fork.
5.Make a pull request to the challenge1_HowCrispCanYouClassify Repo.
The Iris Dataset Details:
- 150 instances/records
- train - 110 records
- test - 40 records
- 4 attributes/features
- 3 classes
The Handwritten Digits Dataset details:
- 5620 instances/records
- train - 4000 records
- test - 1620 records
- 64 attributes/features
- 10 classes
You can use the following resources to to get acquainted with some classification problems:
- Each Data Hacker can participate in as many challenges as they wish
- Branches per Data Challenge need to be a unique combination of a genuine phone number and email address
[email protected]
- Multiple submissions are allowed for as long as the challenge is still open, once the challenge is closed, the last submitted changes will be the evaluated solution
- african.ai reserves the right to announce the winners
- african.ai reserves the right to reward the winners based on african.ai criterion
- Do not share any code that you cannot open source on the Git Repository as it is public and african.ai will not be liable for any breach of intellectual property (if any) once shared on the platform.
- Data Challenges are time bound - the time restriction is specified on each challenge
- Additional rules MAY be provided on the code challenge and will vary for each challenge
- You are free to use all manner of tools
- Successive interviews for projects MAY be run to satisfy participating african.ai partners