This is an experimental automatic content tagger for GOV.UK pages based on the Ankusa gem, using the naive Bayes algorithm.
It attempts to determine correct tags for a page by learning from other, manually tagged pages.
To run the script locally, run ./bin/tag.rb file_name
in your
command line.
The file you pass to the script should be in CSV format with three columns - URL, tag and content. For an example, see the sample_content.csv file.
Just run rspec
in the command line (which will work once the
tests are written).
See the LICENSE file.