All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home

Class weka.classifiers.Evaluation

java.lang.Object
   |
   +----weka.classifiers.Evaluation

public class Evaluation
extends Object
implements Summarizable
Class for evaluating machine learning models.

-------------------------------------------------------------------

General options when evaluating a learning scheme from the command-line:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-x number
The number of folds for the cross-validation (default: 10).

-s seed
Random number seed for the cross-validation (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file.

-d filename
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

-i
Outputs information-retrieval statistics for two-class problems.

-k
Outputs information-theoretic statistics.

-p
Outputs predictions for test instances (and nothing else).

-r
Outputs cumulative margin distribution (and nothing else).

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-------------------------------------------------------------------

Example usage as the main of a classifier (called FunkyClassifier):

 public static void main(String [] args) {
   try {
     Classifier scheme = new FunkyClassifier();
     System.out.println(Evaluation.evaluateModel(scheme, args));
   } catch (Exception e) {
     System.err.println(e.getMessage());
   }
 }
 

------------------------------------------------------------------

Example usage from within an application:

 Instances trainInstances = ... instances got from somewhere
 Instances testInstances = ... instances got from somewhere
 Classifier scheme = ... scheme got from somewhere
 Evaluation evaluation = new Evaluation(trainInstances);
 evaluation.evaluateModel(scheme, testInstances);
 System.out.println(evaluation.toSummaryString());
 

Version:
$Revision: 1.19 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz)

Constructor Index

 o Evaluation(Instances)
Initializes all the counters for the evaluation.
 o Evaluation(Instances, CostMatrix, Random)
Initializes all the counters for the evaluation and also takes a cost matrix as parameter.

Method Index

 o confusionMatrix()
Returns a copy of the confusion matrix.
 o correct()
Gets the number of instances correctly classified (that is, for which a correct prediction was made).
 o correlationCoefficient()
Returns the correlation coefficient if the class is numeric.
 o crossValidateModel(Classifier, Instances, int)
Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
 o crossValidateModel(String, Instances, int, String[])
Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
 o equals(Object)
Tests whether the current evaluation object is equal to another evaluation object
 o errorRate()
Returns the estimated error rate or the root mean squared error (if the class is numeric).
 o evaluateModel(Classifier, Instances)
Evaluates the classifier on a given set of instances.
 o evaluateModel(Classifier, String[])
Evaluates a classifier with the options given in an array of strings.
 o evaluateModel(String, String[])
Evaluates a classifier with the options given in an array of strings.
 o evaluateModelOnce(Classifier, Instance)
Evaluates the classifier on a single instance.
 o evaluateModelOnce(double, Instance)
Evaluates the supplied prediction on a single instance.
 o falsePositives(int)
Calculate the false positive rate with respect to a particular class.
 o incorrect()
Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made).
 o KBInformation()
Return the total Kononenko & Bratko Information score in bits
 o KBMeanInformation()
Return the Kononenko & Bratko Information score in bits per instance.
 o KBRelativeInformation()
Return the Kononenko & Bratko Relative Information score
 o main(String[])
A test method for this class.
 o meanAbsoluteError()
Returns the mean absolute error.
 o meanPriorAbsoluteError()
Returns the mean absolute error of the prior.
 o numInstances()
Gets the number of test instances that had a known class value (actually the sum of the weights of test instances with known class value).
 o pctCorrect()
Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).
 o pctIncorrect()
Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).
 o pctUnclassified()
Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).
 o priorEntropy()
Calculate the entropy of the prior distribution
 o relativeAbsoluteError()
Returns the relative absolute error.
 o rootMeanPriorSquaredError()
Returns the root mean prior squared error.
 o rootMeanSquaredError()
Returns the root mean squared error.
 o rootRelativeSquaredError()
Returns the root relative squared error if the class is numeric.
 o setPriors(Instances)
Sets the class prior probabilities
 o SFEntropyGain()
Returns the total SF, which is the null model entropy minus the scheme entropy.
 o SFMeanEntropyGain()
Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
 o SFMeanPriorEntropy()
Returns the entropy per instance for the null model
 o SFMeanSchemeEntropy()
Returns the entropy per instance for the scheme
 o SFPriorEntropy()
Returns the total entropy for the null model
 o SFSchemeEntropy()
Returns the total entropy for the scheme
 o toClassDetailsString()
 o toClassDetailsString(String)
For the following confusion matrix
 A B C
 5 1 0  A
 2 7 1  B
 1 1 9  C
 
Will print out a breakdown of the accuracy for each class, eg:
   TP    FP  Class
 0.85  0.14   A
 0.70  0.11   B
 0.82  0.06   C
 
Should be useful for ROC curves.
 o toCumulativeMarginDistributionString()
Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
 o toInformationRetrievalStatisticsString()
Calls toInformationRetrievalStatisticsString() with a default title.
 o toInformationRetrievalStatisticsString(String)
Outputs information retrieval statistics (precision, recall, f-measure) for two-class problems.
 o toMatrixString()
Calls toMatrixString() with a default title.
 o toMatrixString(String)
Outputs the performance statistics as a classification confusion matrix.
 o toSummaryString()
Calls toSummaryString() with no title and no complexity stats
 o toSummaryString(boolean)
Calls toSummaryString() with a default title.
 o toSummaryString(String, boolean)
Outputs the performance statistics in summary form.
 o truePositives(int)
Calculate the true positive rate with respect to a particular class.
 o unclassified()
Gets the number of instances not classified (that is, for which no prediction was made by the classifier).
 o updatePriors(Instance)
Updates the class prior probabilities (when incrementally training)

Constructors

 o Evaluation
 public Evaluation(Instances data) throws Exception
Initializes all the counters for the evaluation.

Parameters:
data - set of training instances, to get some header information and prior class distribution information
Throws: Exception
if the class is not defined
 o Evaluation
 public Evaluation(Instances data,
                   CostMatrix costMatrix,
                   Random random) throws Exception
Initializes all the counters for the evaluation and also takes a cost matrix as parameter.

Parameters:
data - set of instances, to get some header information
costMatrix - the cost matrix---if null, default costs will be used
random - a random number generator for cost matrix-based resampling---if set to null, no resampling is performed
Throws: Exception
if cost matrix is not compatible with data, the class is not defined or the class is numeric

Methods

 o confusionMatrix
 public double[][] confusionMatrix()
Returns a copy of the confusion matrix.

Returns:
a copy of the confusion matrix as a two-dimensional array
 o crossValidateModel
 public void crossValidateModel(Classifier classifier,
                                Instances data,
                                int numFolds) throws Exception
Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.

Parameters:
classifier - the classifier with any options set.
data - the data on which the cross-validation is to be performed
numFolds - the number of folds for the cross-validation
Throws: Exception
if a classifier could not be generated successfully or the class is not defined
 o crossValidateModel
 public void crossValidateModel(String classifierString,
                                Instances data,
                                int numFolds,
                                String options[]) throws Exception
Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.

Parameters:
classifier - a string naming the class of the classifier
data - the data on which the cross-validation is to be performed
numFolds - the number of folds for the cross-validation
options - the options to the classifier. Any options accepted by the classifier will be removed from this array.
Throws: Exception
if a classifier could not be generated successfully or the class is not defined
 o evaluateModel
 public static String evaluateModel(String classifierString,
                                    String options[]) throws Exception
Evaluates a classifier with the options given in an array of strings.

Valid options are:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-x number
The number of folds for the cross-validation (default: 10).

-s seed
Random number seed for the cross-validation (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file.

-d filename
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

-i
Outputs information-retrieval statistics for two-class problems.

-k
Outputs information-theoretic statistics.

-p
Outputs predictions for test instances (and nothing else).

-r
Outputs cumulative margin distribution (and nothing else).

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

Parameters:
classifierString - class of machine learning classifier as a string
options - the array of string containing the options
Returns:
a string describing the results
Throws: Exception
if model could not be evaluated successfully
 o main
 public static void main(String args[])
A test method for this class. Just extracts the first command line argument as a classifier class name and calls evaluateModel.

Parameters:
args - an array of command line arguments, the first of which must be the class name of a classifier.
 o evaluateModel
 public static String evaluateModel(Classifier classifier,
                                    String options[]) throws Exception
Evaluates a classifier with the options given in an array of strings.

Valid options are:

-t name of training file
Name of the file with the training data. (required)

-T name of test file
Name of the file with the test data. If missing a cross-validation is performed.

-c class index
Index of the class attribute (1, 2, ...; default: last).

-x number of folds
The number of folds for the cross-validation (default: 10).

-s random number seed
Random number seed for the cross-validation (default: 1).

-m file with cost matrix
The name of a file containing a cost matrix.

-l name of model input file
Loads classifier from the given file.

-d name of model output file
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

-i
Outputs information-retrieval statistics for two-class problems.

-k
Outputs information-theoretic statistics.

-p
Outputs predictions for test instances (and nothing else).

-r
Outputs cumulative margin distribution (and nothing else).

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

Parameters:
classifier - machine learning classifier
options - the array of string containing the options
Returns:
a string describing the results
Throws: Exception
if model could not be evaluated successfully
 o evaluateModel
 public void evaluateModel(Classifier classifier,
                           Instances data) throws Exception
Evaluates the classifier on a given set of instances.

Parameters:
classifier - machine learning classifier
data - set of test instances for evaluation
Throws: Exception
if model could not be evaluated successfully
 o evaluateModelOnce
 public double evaluateModelOnce(Classifier classifier,
                                 Instance instance) throws Exception
Evaluates the classifier on a single instance.

Parameters:
classifier - machine learning classifier
instance - the test instance to be classified
Returns:
the prediction made by the clasifier
Throws: Exception
if model could not be evaluated successfully or the data contains string attributes
 o evaluateModelOnce
 public void evaluateModelOnce(double prediction,
                               Instance instance) throws Exception
Evaluates the supplied prediction on a single instance.

Parameters:
prediction - the supplied prediction
instance - the test instance to be classified
Throws: Exception
if model could not be evaluated successfully
 o numInstances
 public final double numInstances()
Gets the number of test instances that had a known class value (actually the sum of the weights of test instances with known class value).

Returns:
the number of test instances with known class
 o incorrect
 public final double incorrect()
Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made). (Actually the sum of the weights of these instances)

Returns:
the number of incorrectly classified instances
 o pctIncorrect
 public final double pctIncorrect()
Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).

Returns:
the percent of incorrectly classified instances (between 0 and 100)
 o correct
 public final double correct()
Gets the number of instances correctly classified (that is, for which a correct prediction was made). (Actually the sum of the weights of these instances)

Returns:
the number of correctly classified instances
 o pctCorrect
 public final double pctCorrect()
Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).

Returns:
the percent of correctly classified instances (between 0 and 100)
 o unclassified
 public final double unclassified()
Gets the number of instances not classified (that is, for which no prediction was made by the classifier). (Actually the sum of the weights of these instances)

Returns:
the number of unclassified instances
 o pctUnclassified
 public final double pctUnclassified()
Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).

Returns:
the percent of unclassified instances (between 0 and 100)
 o errorRate
 public final double errorRate()
Returns the estimated error rate or the root mean squared error (if the class is numeric). If a cost matrix was given this error rate involves weights from the cost matrix.

Returns:
the estimated error rate (between 0 and 1)
 o correlationCoefficient
 public final double correlationCoefficient() throws Exception
Returns the correlation coefficient if the class is numeric.

Returns:
the correlation coefficient
Throws: Exception
if class is not numeric
 o meanAbsoluteError
 public final double meanAbsoluteError()
Returns the mean absolute error. Refers to the error of the predicted values for numeric classes, and the error of the predicted probability distribution for nominal classes.

Returns:
the mean absolute error
 o meanPriorAbsoluteError
 public final double meanPriorAbsoluteError()
Returns the mean absolute error of the prior.

Returns:
the mean absolute error
 o relativeAbsoluteError
 public final double relativeAbsoluteError() throws Exception
Returns the relative absolute error.

Returns:
the relative absolute error
Throws: Exception
if it can't be computed
 o rootMeanSquaredError
 public final double rootMeanSquaredError()
Returns the root mean squared error.

Returns:
the root mean squared error
 o rootMeanPriorSquaredError
 public final double rootMeanPriorSquaredError()
Returns the root mean prior squared error.

Returns:
the root mean prior squared error
 o rootRelativeSquaredError
 public final double rootRelativeSquaredError()
Returns the root relative squared error if the class is numeric.

Returns:
the root relative squared error
 o priorEntropy
 public final double priorEntropy() throws Exception
Calculate the entropy of the prior distribution

Returns:
the entropy of the prior distribution
Throws: Exception
if the class is not nominal
 o KBInformation
 public final double KBInformation() throws Exception
Return the total Kononenko & Bratko Information score in bits

Returns:
the K&B information score
Throws: Exception
if the class is not nominal
 o KBMeanInformation
 public final double KBMeanInformation() throws Exception
Return the Kononenko & Bratko Information score in bits per instance.

Returns:
the K&B information score
Throws: Exception
if the class is not nominal
 o KBRelativeInformation
 public final double KBRelativeInformation() throws Exception
Return the Kononenko & Bratko Relative Information score

Returns:
the K&B relative information score
Throws: Exception
if the class is not nominal
 o SFPriorEntropy
 public final double SFPriorEntropy()
Returns the total entropy for the null model

Returns:
the total null model entropy
 o SFMeanPriorEntropy
 public final double SFMeanPriorEntropy()
Returns the entropy per instance for the null model

Returns:
the null model entropy per instance
 o SFSchemeEntropy
 public final double SFSchemeEntropy()
Returns the total entropy for the scheme

Returns:
the total scheme entropy
 o SFMeanSchemeEntropy
 public final double SFMeanSchemeEntropy()
Returns the entropy per instance for the scheme

Returns:
the scheme entropy per instance
 o SFEntropyGain
 public final double SFEntropyGain()
Returns the total SF, which is the null model entropy minus the scheme entropy.

Returns:
the total SF
 o SFMeanEntropyGain
 public final double SFMeanEntropyGain()
Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.

Returns:
the SF per instance
 o toCumulativeMarginDistributionString
 public String toCumulativeMarginDistributionString() throws Exception
Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.

Returns:
the cumulative margin distribution
Throws: Exception
if the class attribute is nominal
 o toInformationRetrievalStatisticsString
 public String toInformationRetrievalStatisticsString() throws Exception
Calls toInformationRetrievalStatisticsString() with a default title.

Throws: Exception
if the dataset is not a two-class dataset.
 o toInformationRetrievalStatisticsString
 public String toInformationRetrievalStatisticsString(String title) throws Exception
Outputs information retrieval statistics (precision, recall, f-measure) for two-class problems.

Parameters:
title - the title for the statistics
Returns:
the summary as a String
Throws: Exception
if the dataset is not a two-class dataset.
 o toSummaryString
 public String toSummaryString()
Calls toSummaryString() with no title and no complexity stats

Returns:
a summary description of the classifier evaluation
 o toSummaryString
 public String toSummaryString(boolean printComplexityStatistics)
Calls toSummaryString() with a default title.

Parameters:
printComplexityStatistics - if true, complexity statistics are returned as well
 o toSummaryString
 public String toSummaryString(String title,
                               boolean printComplexityStatistics)
Outputs the performance statistics in summary form. Lists number (and percentage) of instances classified correctly, incorrectly and unclassified. Outputs the total number of instances classified, and the number of instances (if any) that had no class value provided.

Parameters:
title - the title for the statistics
printComplexityStatistics - if true, complexity statistics are returned as well
Returns:
the summary as a String
 o toMatrixString
 public String toMatrixString() throws Exception
Calls toMatrixString() with a default title.

Returns:
the confusion matrix as a string
Throws: Exception
if the class is numeric
 o toMatrixString
 public String toMatrixString(String title) throws Exception
Outputs the performance statistics as a classification confusion matrix. For each class value, shows the distribution of predicted class values.

Parameters:
title - the title for the confusion matrix
Returns:
the confusion matrix as a String
Throws: Exception
if the class is numeric
 o toClassDetailsString
 public String toClassDetailsString() throws Exception
 o toClassDetailsString
 public String toClassDetailsString(String title) throws Exception
For the following confusion matrix
 A B C
 5 1 0  A
 2 7 1  B
 1 1 9  C
 
Will print out a breakdown of the accuracy for each class, eg:
   TP    FP  Class
 0.85  0.14   A
 0.70  0.11   B
 0.82  0.06   C
 
Should be useful for ROC curves.

 o truePositives
 public double truePositives(int classIndex)
Calculate the true positive rate with respect to a particular class. This is defined as

 correctly classified positives
 ------------------------------
       total positives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the true positive rate
 o falsePositives
 public double falsePositives(int classIndex)
Calculate the false positive rate with respect to a particular class. This is defined as

 incorrectly classified negatives
 --------------------------------
        total negatives
 

Parameters:
classIndex - the index of the class to consider as "positive"
Returns:
the false positive rate
 o setPriors
 public void setPriors(Instances train) throws Exception
Sets the class prior probabilities

Parameters:
train - the training instances used to determine the prior probabilities
Throws: Exception
if the class attribute of the instances is not set
 o updatePriors
 public void updatePriors(Instance instance) throws Exception
Updates the class prior probabilities (when incrementally training)

Parameters:
instance - the new training instance seen
Throws: Exception
if the class of the instance is not set
 o equals
 public boolean equals(Object obj)
Tests whether the current evaluation object is equal to another evaluation object

Parameters:
obj - the object to compare against
Returns:
true if the two objects are equal
Overrides:
equals in class Object

All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home