vowpal_wabbit/demo/dna at master · wyxy2005/vowpal_wabbit

History

Name		Name	Last commit message	Last commit date
parent directory ..
.gitignore		.gitignore
Makefile		Makefile
README		README
do-dna-multicore-train		do-dna-multicore-train
do-dnahogwild-multicore-train		do-dnahogwild-multicore-train
do-dnahogwildnn-multicore-train		do-dnahogwildnn-multicore-train
do-dnann-multicore-train		do-dnann-multicore-train
do-dnasmash-multicore-train		do-dnasmash-multicore-train
do-perf		do-perf
do-test		do-test
map		map
quaddna2vw.cpp		quaddna2vw.cpp

README

This is the Splice Site recognition dataset from the 2008 Pascal Large
Scale Learning Challenge (http://largescale.ml.tu-berlin.de/summary/).

=== WARNINGS ===

  * you need a beefy machine to comfortably run these demos 
      4 cores or more
      SSD or lots of RAM because the demo is I/O bound
  * the demos are intolerably slow under cygwin (pipes do not work well)

=== INSTRUCTIONS ===

  * make dna.perf
      downloads the dna dataset, trains a 4-gram logistic regression
      in parallel on 4 cores and computes test set statistics
        disk space requirements: about 3 gigabytes
        training time requirements: about 7 cores for about 15 minutes
          ultimately I/O bound: bzcat is the limiting factor
        memory requirements: less than 512 megabytes
      results in APR of 0.512

  * make dnann.perf
      same as dna.perf, but with additionally 1 neural network hidden node 
      slower (by circa 60 seconds) but better
      results in APR of 0.532

  * make dnasmash.perf
      as above but builds a better model
      uses 10 iteratons of parallel sgd with 4 neural network nodes
        disk space requirements: about 10 gigabytes
          additional 7 gigabytes of vw cache on top of original data
        training time requirements:
          16 minutes to build cache over first (ever) pass
          subsequently, 6 minute per pass if you have SSD or enough RAM cache
            10 passes = 60 minutes (x 6 cores)
      results in APR of 0.545

  * make dnahogwild.perf
      same as dna.perf, but trained via lock-free multicore sgd ("hogwild") 
        rather than parallel sgd + averaging
      nondeterministic, but a typical result is APR of 0.516

  * make dnahogwildnn.perf
      same as dnann.perf, but trained via lock-free multicore sgd ("hogwild")
        rather than parallel sgd + averaging
      nondeterministic, but a typical result is APR of 0.536

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dna

dna

README

Files

dna

Directory actions

More options

Directory actions

More options

Latest commit

History

dna

Folders and files

parent directory

README