MICROSOFT PROVIDES THE DATASETS ON AN "AS IS" BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INLCUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
The datasets are provided under the original terms that Microsoft received such datasets. See below for more information about each dataset.
This dataset is provided under CC0. Redistributing the dataset "wikipedia-detox-250-line-data.tsv" with attribution:
Wulczyn, Ellery; Thain, Nithum; Dixon, Lucas (2016): Wikipedia Detox. figshare.
With modifications by taking a sample of rows and reducing rough language.
Original source: https://doi.org/10.6084/m9.figshare.4054689
Original readme: https://meta.wikimedia.org/wiki/Research:Detox
This dataset is provided under http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits.
References: C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their Applications to Handwritten Digit Recognition, MSc Thesis, Institute of Graduate Studies in Science and Engineering, Bogazici University. E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [https://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Redistributing the dataset "breast-cancer.txt" with attribution:
O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp 1 & 18.
Original source: http://ftp.cs.wisc.edu:80/math-prog/cpo-dataset/machine-learn/cancer/cancer1/datacum
Original readme: http://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine-learn/cancer/cancer1/data.doc
MNIST data originally from NIST and modified by Chris Burges, Corinna Cortes, and Yann LeCun. http://yann.lecun.com/exdb/mnist/
More information: https://en.wikipedia.org/wiki/MNIST_database
Redistributing the dataset "taxi-fare-test.csv", "taxi-fare-train.csv" with attribution:
Original source: https://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
The dataset is provided under terms provided by City of New York: https://opendata.cityofnewyork.us/overview/#termsofuse.
This dataset is originally from Introducing LETOR 4.0 Datasets. The dataset is under a CC-by 4.0 license.
@article{DBLP:journals/corr/QinL13,
author = {Tao Qin and
Tie{-}Yan Liu},
title = {Introducing {LETOR} 4.0 Datasets},
journal = {CoRR},
volume = {abs/1306.2597},
year = {2013},
url = {https://arxiv.org/abs/1306.2597},
timestamp = {Mon, 01 Jul 2013 20:31:25 +0200},
biburl = {https://dblp.uni-trier.de/rec/bib/journals/corr/QinL13},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Redistributing the dataset "housing.txt" with attribution:
Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.
More information: https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.names
This dataset is from the R documentation: [New York Air Quality Measurements]https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/airquality.html The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data). References: Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth.
The dataset is distributed under GPLv2
This dataset is from the R documentation: [Infertility after Spontaneous and Induced Abortion]https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/infert.html Original source: Trichopoulos et al (1976) Br. J. of Obst. and Gynaec. 83, 645–650.
The dataset is distributed under GPLv2
"Banana and cross section" by fir0002 is licensed under the CC BY-NC
"Hot dog with mustard" by Renee Comet is in the public domain - this image was released by the National Cancer Institute
"Bright red tomato and cross section02" by fir0002 is licensed under the CC BY-NC