Skip to content

Commit d1e05cd

Browse files
committed
update code example (use FileInputStream to read data)
1 parent e250789 commit d1e05cd

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

README.md

+13-9
Original file line numberDiff line numberDiff line change
@@ -106,11 +106,11 @@ Sentence Splitter and Tokenizer, Bigram Statistical Test, Phrase Extractor, Keyw
106106

107107
Model Serialization
108108
===================
109-
You may notice that none of models supports Java Serializable interface. It is because the exact format is hard to keep stable,
109+
You may notice that most models supports Java `Serializable` interface (all classifiers do support `Serializable` interface). It is because the exact format is hard to keep stable,
110110
class changes can easily make your serialized data unreadable, reading/writing the data in non-Java code is almost impossible.
111-
Currently, we suggest <a href="http://xstream.codehaus.org">XStream</a> to serialize the trained models.
111+
Currently, we suggest [XStream](http://xstream.codehaus.org) to serialize the trained models.
112112
XStream is a simple library to serialize objects to XML and back again. XStream is easy to use and doesn't require mappings
113-
(actually requires no modifications to objects). <a href="http://code.google.com/p/protostuff/">Protostuff</a> is a
113+
(actually requires no modifications to objects). [Protostuff](http://code.google.com/p/protostuff/) is a
114114
nice alternative that supports forward-backward compatibility (schema evolution) and validation.
115115
Beyond XML, Protostuff supports many other formats such as JSON, YAML, protobuf, etc. For some predicitive models,
116116
we look forward to supporting PMML (Predictive Model Markup Language), an XML-based file format developed by the Data Mining Group.
@@ -236,11 +236,15 @@ Most Smile algorithms take simple double[] as input. So you can use your favorit
236236
```java
237237
ArffParser arffParser = new ArffParser();
238238
arffParser.setResponseIndex(4);
239-
AttributeDataset weather = arffParser.parse(this.getClass().getResourceAsStream("/smile/data/weka/weather.nominal.arff"));
239+
AttributeDataset weather = arffParser.parse(new FileInputStream("data/weka/weather.nominal.arff"));
240240
double[][] x = weather.toArray(new double[weather.size()][]);
241241
int[] y = weather.toArray(new int[weather.size()]);
242242
```
243-
Note that the data file weather.nominal.arff is not included in the release. To try out the example, please just download any arff file from Internet. In the second line, we use setResponseIndex to set the column index (starting at 0) of dependent/response variable. In supervised learning, we need a response variable for each sample to train the model. Basically, it is the _y_ in the mathematical model. For classification, it is the class label. For regression, it is of real value. Without setting it, the data assume no response variable. In that case, the data can be used for testing or unsupervised learning.
243+
Note that the data file weather.nominal.arff is in Smile distribution package.
244+
After unpack the package, there are a lot of testing data in the directory of
245+
`$smile/data`, where `$smile` is the the root of Smile package.
246+
247+
In the second line, we use setResponseIndex to set the column index (starting at 0) of dependent/response variable. In supervised learning, we need a response variable for each sample to train the model. Basically, it is the _y_ in the mathematical model. For classification, it is the class label. For regression, it is of real value. Without setting it, the data assume no response variable. In that case, the data can be used for testing or unsupervised learning.
244248

245249
The parse method can take a URI, File, path string, or InputStream as input argument. And it returns an AttributeDataset object, which is a dataset of a number of attributes. All attribute values are stored as double even if the attribute may be nominal, ordinal, string, or date. The first call of toArray taking a double[][] argument fills the array with all the parsed data and returns it, of which each row is a sample/object. The second call of toArray taking an int array fills it with the class labels of the samples and then returns it.
246250

@@ -250,7 +254,7 @@ Similar to ArffParser, we can also use the DelimitedTextParser class to parse pl
250254
```java
251255
DelimitedTextParser parser = new DelimitedTextParser();
252256
parser.setResponseIndex(new NominalAttribute("class"), 0);
253-
AttributeDataset usps = parser.parse("USPS Train", this.getClass().getResourceAsStream("/smile/data/usps/zip.train"));
257+
AttributeDataset usps = parser.parse("USPS Train", new FileInputStream("data/usps/zip.train"));
254258
```
255259
where the setResponseIndex also take an extra parameter about the attribute of response variable. Because this is a classification problem, we set it a NominalAttribute with name "class". In case of regression, we should use NumericAttribute instead.
256260

@@ -262,8 +266,8 @@ Smile implements a variety of classification and regression algorithms. In what
262266
DelimitedTextParser parser = new DelimitedTextParser();
263267
parser.setResponseIndex(new NominalAttribute("class"), 0);
264268
try {
265-
AttributeDataset train = parser.parse("USPS Train", this.getClass().getResourceAsStream("/smile/data/usps/zip.train"));
266-
AttributeDataset test = parser.parse("USPS Test", this.getClass().getResourceAsStream("/smile/data/usps/zip.test"));
269+
AttributeDataset train = parser.parse("USPS Train", new FileInputStream("/data/usps/zip.train"));
270+
AttributeDataset test = parser.parse("USPS Test", new FileInputStream("/data/usps/zip.test"));
267271

268272
double[][] x = train.toArray(new double[train.size()][]);
269273
int[] y = train.toArray(new int[train.size()]);
@@ -306,7 +310,7 @@ As aforementioned, tree base methods need the type information of attributes. In
306310
```java
307311
ArffParser arffParser = new ArffParser();
308312
arffParser.setResponseIndex(4);
309-
AttributeDataset weather = arffParser.parse(this.getClass().getResourceAsStream("/smile/data/weka/weather.nominal.arff"));
313+
AttributeDataset weather = arffParser.parse(new FileInputStream("/data/weka/weather.nominal.arff"));
310314
double[][] x = weather.toArray(new double[weather.size()][]);
311315
int[] y = weather.toArray(new int[weather.size()]);
312316

0 commit comments

Comments
 (0)