WEKA 3.1.6
15 November 1999
Java Programs for Machine Learning
Copyright (C) 1998, 1999 Eibe Frank, Leonard Trigg, Mark Hall
email: wekasupport@cs.waikato.ac.nz
At Tufts:
Put the following two lines in the .cshrc file in your home directory:
setenv WEKAHOME /u/f/ablumer/public_html/weka setenv CLASSPATH /loc/lang/jdk1.1.6/jdk/lib/classes.zip:/u/f/ablumer/public_html/weka/weka.jar:.:..The first time, just after you finish editing the file, you will need to type "source .cshrc", but after this these lines will be executed whenever you log in to a new shell.
From the original file:
For people familiar with their command-line interface
If you are using Java 2 (JDK 1.2 or equivalent) or you have Swing 1.1.1 (or later installed for Java 1.1), you should be able to just double-click on the weka.jar icon, or from a command-line (assuming you are in the directory containing weka.jar) type
or if you are using Windows use
This will start a small GUI from which you can select the SimpleCLI interface or the more sophisticated Explorer and Experimenter interfaces (see below).
If you are using some other Java virtual machine you need to start SimpleCLI from within weka.jar. For JDK 1.1 users something like the following:
or if you are using Windows use
In the following, the names of files assume use of a unix command-line with environment variables. For other command-lines (including SimpleCLI) you should substitute the name of the directory where weka.jar lives where you see $WEKAHOME. If your platform uses something other than / as the path separator, also make the appropriate substitutions.
Try:
(At Tufts, you will need to make sure you "cd $WEKAHOME" before this.)
This prints out a decision tree classifier for the iris dataset and ten-fold cross-validation estimates of its performance. If you don't pass any options to the classifier, WEKA will list all the available options. Try:
The options are divided into "general" options that apply to most classification schemes in WEKA, and scheme-specific options that only apply to the current scheme---in this case J48. WEKA has a common interface to all classification methods. Any class that implements a classifier can be used in the same way as J48 is used above. WEKA knows that a class implements a classifier if it extends the Classifier or DistributionClassifier classes in weka.classifiers. Almost all classes in weka.classifiers fall into this category. Try, for example:
Here is a list of the most important classifiers currently implemented in weka.classifiers:
Next to classification schemes, there is some other useful stuff in WEKA. Association rules, for example, can be extracted using the apriori algorithm. Try
Datasets in WEKA have to be formatted according to the arff format. Examples of arff files can be found in $WEKAHOME/data. What follows is a short description of the file format.
A dataset has to start with a declaration of its name:
@relation name
followed by a list of all the attributes in the dataset (including the class attribute). These declarations have the form
@attribute attribute_name specification
If an attribute is nominal, specification contains a list of the possible attribute values in curly brackets:
@attribute nominal_attribute {first_value, second_value, third_value}
If an attribute is numeric, specification is replaced by the keyword numeric: (Integer values are treated as real numbers in WEKA.)
@attribute numeric_attribute numeric
In addition to these two types of attributes, there also exists a string attribute type. This attribute provides the possibility to store a comment or ID field for each of the instances in a dataset:
@attribute string_attribute string
After the attribute declarations, the actual data is introduced by a
@data
tag, which is followed by a list of all the instances. The instances are listed in comma-separated format, with a question mark representing a missing value. Comments are lines starting with %
There is now support for running experiments that involve evaluating classifiers on repeated randomizations of datasets, over multiple datasets (you can do much more than this, besides). The classes for this reside in the weka.experiment package. The basic architecture is that a ResultProducer (which generates results on some randomization of a dataset) sends results to a ResultListener (which is responsible for stating whether it already has the result, and otherwise storing results).
Example ResultListeners include:
So, you might have a DatabaseResultListener, that is sent results from an AveragingResultProducer, which produces averages over the n results produced for each run of an n-fold CrossValidationResultProducer, which in turn is doing nominal classification through a ClassifierSplitEvaluator, which uses OneR for prediction. Whew. But you can combine these things together to do pretty much whatever you want. You might want to write a LearningRateResultProducer that splits a dataset into increasing numbers of training instances.
In terms of database connectivity, we use InstantDB, a free database implemented entirely in Java. It is available from:
http://www.instantdb.co.uk/index.htm
From there you will also be able to find a RmiJdbc bridge which is useful for running a server that just listens for experiment results from other machines. When using classes that access a database, you will probably want to create a properties file that specifies which jdbc drivers to use, and where to find the database. This file should reside in your home directory or the current directory and be called "DatabaseUtils.props". An example is provided in weka/experiment, this file is used unless it is overidden by one in your home directory or the current directory (in that order).
To run a simple experiment from the command line, try:
java weka.experiment.Experiment -r -T datasets/UCI/iris.arff \
-D weka.experiment.InstancesResultListener \
-P weka.experiment.RandomSplitResultProducer -- \
-W weka.experiment.ClassifierSplitEvaluator -- \
-W weka.classifiers.OneR
(Try "java weka.experiment.Experiment -h" to find out what these options mean)
If you have your results as a set of instances, you can perform paired t-tests using weka.experiment.PairedTTester (use the -h option to find out what options it needs).
This is all much easier from the Experiment Environment GUI :-)
To start the Explorer:
To start the experiment editor:
These _really_ need more documentation, but that'll do to get you started :)
A tutorial on how to use WEKA is in $WEKAHOME/Tutorial.pdf. However, not everything in WEKA is covered in the Tutorial. For a complete list you have to look at the online documentation online documentation In particular, Tutorial.pdf is a draft from the forthcoming book (see our web page), and so only describes features in the stable 3.0 release.
The source code for WEKA is in $WEKAHOME/weka-src.jar. To expand it, use the jar utility that's in every Java distribution.
If you have implemented a learning scheme, filter, application, visualization tool, etc., using the WEKA classes, and you think it should be included in WEKA, send us the code, and we can put it in the next WEKA distribution. If you find any bugs, send a fix to wekasupport@cs.waikato.ac.nz. If that's too hard, just send us a bug report.
WEKA is distributed under the GNU public license. Please read the file COPYING. This page modified from the WEKA README using PageSpinner on a Macintosh.