All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home

Class weka.core.Instances

java.lang.Object
   |
   +----weka.core.Instances

public class Instances
extends Object
implements Serializable
Class for handling an ordered set of weighted instances.

Typical usage (code from the main() method of this class):

...
// Read all the instances in the file
reader = new FileReader(filename);
instances = new Instances(reader);

// Make the last attribute be the class
instances.setClassIndex(instances.numAttributes() - 1);

// Print header and instances.
System.out.println("\nDataset:\n");
System.out.println(instances);

...

All methods that change a set of instances are safe, ie. a change of a set of instances does not affect any other sets of instances. All methods that change a datasets's attribute information clone the dataset before it is changed.

Version:
$Revision: 1.13 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz)

Constructor Index

 o Instances(Instances)
Constructor copying all instances and references to the header information from the given set of instances.
 o Instances(Instances, int)
Constructor creating an empty set of instances.
 o Instances(Instances, int, int)
Creates a new set of instances by copying a subset of another set.
 o Instances(Reader)
Reads an ARFF file from a reader, and assigns a weight of one to each instance.
 o Instances(Reader, int)
Reads the header of an ARFF file from a reader and reserves space for the given number of instances.
 o Instances(String, FastVector, int)
Creates an empty set of instances.

Method Index

 o add(Instance)
Adds one instance to the end of the set.
 o attribute(int)
Returns an attribute.
 o attribute(String)
Returns an attribute given its name.
 o attributeStats(int)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute.
 o checkForStringAttributes()
Checks for string attributes in the dataset
 o checkInstance(Instance)
Checks if the given instance is compatible with this dataset.
 o classAttribute()
Returns the class attribute.
 o classIndex()
Returns the class attribute's index.
 o compactify()
Compactifies the set of instances.
 o delete(int)
Removes an instance at the given position from the set.
 o deleteAttributeAt(int)
Deletes an attribute at the given position (0 to numAttributes() - 1).
 o deleteStringAttributes()
Deletes all string attributes in the dataset.
 o deleteWithMissing(Attribute)
Removes all instances with missing values for a particular attribute from the dataset.
 o deleteWithMissing(int)
Removes all instances with missing values for a particular attribute from the dataset.
 o deleteWithMissingClass()
Removes all instances with a missing class value from the dataset.
 o enumerateAttributes()
Returns an enumeration of all the attributes.
 o enumerateInstances()
Returns an enumeration of all instances in the dataset.
 o equalHeaders(Instances)
Checks if two headers are equivalent.
 o firstInstance()
Returns the first instance in the set.
 o insertAttributeAt(Attribute, int)
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing.
 o instance(int)
Returns the instance at the given position.
 o lastInstance()
Returns the last instance in the set.
 o main(String[])
Main method for this class -- just prints a summary of a set of instances.
 o meanOrMode(Attribute)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.
 o meanOrMode(int)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value.
 o mergeInstances(Instances, Instances)
Merges two sets of Instances together.
 o numAttributes()
Returns the number of attributes.
 o numClasses()
Returns the number of class labels.
 o numDistinctValues(Attribute)
Returns the number of distinct values of a given attribute.
 o numDistinctValues(int)
Returns the number of distinct values of a given attribute.
 o numInstances()
Returns the number of instances in the dataset.
 o randomize(Random)
Shuffles the instances in the set so that they are ordered randomly.
 o readInstance(Reader)
Reads a single instance from the reader and appends it to the dataset.
 o relationName()
Returns the relation's name.
 o renameAttribute(Attribute, String)
Renames an attribute.
 o renameAttribute(int, String)
Renames an attribute.
 o renameAttributeValue(Attribute, String, String)
Renames the value of a nominal (or string) attribute value.
 o renameAttributeValue(int, int, String)
Renames the value of a nominal (or string) attribute value.
 o resample(Random)
Creates a new dataset of the same size using random sampling with replacement.
 o resampleWithWeights(Random, double[])
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector.
 o setClass(Attribute)
Sets the class attribute.
 o setClassIndex(int)
Sets the class index of the set.
 o setRelationName(String)
Sets the relation's name.
 o sort(Attribute)
Sorts the instances based on an attribute.
 o sort(int)
Sorts the instances based on an attribute.
 o stratify(int)
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).
 o sumOfWeights()
Computes the sum of all the instances' weights.
 o test(String[])
Method for testing this class.
 o testCV(int, int)
Creates the test set for one fold of a cross-validation on the dataset.
 o toString()
Returns the dataset as a string in ARFF format.
 o toSummaryString()
Generates a string summarizing the set of instances.
 o trainCV(int, int)
Creates the training set for one fold of a cross-validation on the dataset.
 o variance(Attribute)
Computes the variance for a numeric attribute.
 o variance(int)
Computes the variance for a numeric attribute.

Constructors

 o Instances
 public Instances(Reader reader) throws Exception
Reads an ARFF file from a reader, and assigns a weight of one to each instance. Lets the index of the class attribute be undefined (negative).

Parameters:
reader - the reader
Throws: Exception
if the ARFF file is not read successfully
 o Instances
 public Instances(Reader reader,
                  int capacity) throws Exception
Reads the header of an ARFF file from a reader and reserves space for the given number of instances. Lets the class index be undefined (negative).

Parameters:
reader - the reader
capacity - the capacity
Throws: Exception
if the header is not read successfully or the capacity is not positive or zero
 o Instances
 public Instances(Instances dataset)
Constructor copying all instances and references to the header information from the given set of instances.

Parameters:
instances - the set to be copied
 o Instances
 public Instances(Instances dataset,
                  int capacity)
Constructor creating an empty set of instances. Copies references to the header information from the given set of instances. Sets the capacity of the set of instances to 0 if its negative.

Parameters:
instances - the instances from which the header information is to be taken
capacity - the capacity of the new dataset
 o Instances
 public Instances(Instances source,
                  int first,
                  int toCopy) throws Exception
Creates a new set of instances by copying a subset of another set.

Parameters:
source - the set of instances from which a subset is to be created
first - the index of the first instance to be copied
toCopy - the number of instances to be copied
Throws: Exception
if first and toCopy are out of range
 o Instances
 public Instances(String name,
                  FastVector attInfo,
                  int capacity)
Creates an empty set of instances. Uses the given attribute information. Sets the capacity of the set of instances to 0 if its negative. Given attribute information must not be changed after this constructor has been used.

Parameters:
name - the name of the relation
attInfo - the attribute information
capacity - the capacity of the set

Methods

 o add
 public final void add(Instance instance)
Adds one instance to the end of the set. Shallow copies instance before it is added. Increases the size of the dataset if it is not large enough. Does not check if the instance is compatible with the dataset.

Parameters:
instance - the instance to be added
 o attribute
 public final Attribute attribute(int index)
Returns an attribute.

Parameters:
index - the attribute's index
Returns:
the attribute at the given position
 o attribute
 public final Attribute attribute(String name)
Returns an attribute given its name. If there is more than one attribute with the same name, it returns the first one. Returns null if the attribute can't be found.

Parameters:
name - the attribute's name
Returns:
the attribute with the given name, null if the attribute can't be found
 o checkForStringAttributes
 public boolean checkForStringAttributes()
Checks for string attributes in the dataset

Returns:
true if string attributes are present, false otherwise
 o checkInstance
 public final boolean checkInstance(Instance instance)
Checks if the given instance is compatible with this dataset. Only looks at the size of the instance and the ranges of the values for nominal and string attributes.

Returns:
true if the instance is compatible with the dataset
 o classAttribute
 public final Attribute classAttribute() throws Exception
Returns the class attribute.

Returns:
the class attribute
Throws: Exception
if the class is not set
 o classIndex
 public final int classIndex()
Returns the class attribute's index. Returns negative number if it's undefined.

Returns:
the class index as an integer
 o compactify
 public final void compactify()
Compactifies the set of instances. Decreases the capacity of the set so that it matches the number of instances in the set.

 o delete
 public final void delete(int index)
Removes an instance at the given position from the set.

Parameters:
index - the instance's position
 o deleteAttributeAt
 public void deleteAttributeAt(int position) throws Exception
Deletes an attribute at the given position (0 to numAttributes() - 1). A deep copy of the attribute information is performed before the attribute is deleted.

Parameters:
pos - the attribute's position
Throws: Exception
if the given index is out of range or the class attribute is being deleted
 o deleteStringAttributes
 public void deleteStringAttributes() throws Exception
Deletes all string attributes in the dataset. A deep copy of the attribute information is performed before an attribute is deleted.

Throws: Exception
if string attribute couldn't be successfully deleted.
 o deleteWithMissing
 public final void deleteWithMissing(int attIndex)
Removes all instances with missing values for a particular attribute from the dataset.

Parameters:
attIndex - the attribute's index
 o deleteWithMissing
 public final void deleteWithMissing(Attribute att)
Removes all instances with missing values for a particular attribute from the dataset.

Parameters:
att - the attribute
 o deleteWithMissingClass
 public final void deleteWithMissingClass() throws Exception
Removes all instances with a missing class value from the dataset.

Throws: Exception
if class is not set
 o enumerateAttributes
 public Enumeration enumerateAttributes()
Returns an enumeration of all the attributes.

Returns:
enumeration of all the attributes.
 o enumerateInstances
 public final Enumeration enumerateInstances()
Returns an enumeration of all instances in the dataset.

Returns:
enumeration of all instances in the dataset
 o equalHeaders
 public final boolean equalHeaders(Instances dataset)
Checks if two headers are equivalent.

Parameters:
dataset - another dataset
Returns:
true if the header of the given dataset is equivalent to this header
 o firstInstance
 public final Instance firstInstance()
Returns the first instance in the set.

Returns:
the first instance in the set
 o insertAttributeAt
 public void insertAttributeAt(Attribute att,
                               int position) throws Exception
Inserts an attribute at the given position (0 to numAttributes()) and sets all values to be missing. Shallow copies the attribute before it is inserted, and performs a deep copy of the existing attribute information.

Parameters:
att - the attribute to be inserted
pos - the attribute's position
Throws: Exception
if the given index is out of range
 o instance
 public final Instance instance(int index)
Returns the instance at the given position.

Parameters:
index - the instance's index
Returns:
the instance at the given position
 o lastInstance
 public final Instance lastInstance()
Returns the last instance in the set.

Returns:
the last instance in the set
 o meanOrMode
 public final double meanOrMode(int attIndex)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.

Parameters:
attIndex - the attribute's index
Returns:
the mean or the mode
 o meanOrMode
 public final double meanOrMode(Attribute att)
Returns the mean (mode) for a numeric (nominal) attribute as a floating-point value. Returns 0 if the attribute is neither nominal nor numeric. If all values are missing it returns zero.

Parameters:
att - the attribute
Returns:
the mean or the mode
 o numAttributes
 public final int numAttributes()
Returns the number of attributes.

Returns:
the number of attributes as an integer
 o numClasses
 public final int numClasses() throws Exception
Returns the number of class labels.

Returns:
the number of class labels as an integer if the class attribute is nominal, 1 otherwise.
Throws: Exception
if the class is not set
 o numDistinctValues
 public final int numDistinctValues(int attIndex)
Returns the number of distinct values of a given attribute. Returns the number of instances if the attribute is a string attribute. The value 'missing' is not counted.

Parameters:
attIndex - the attribute
Returns:
the number of distinct values of a given attribute
 o numDistinctValues
 public final int numDistinctValues(Attribute att)
Returns the number of distinct values of a given attribute. Returns the number of instances if the attribute is a string attribute. The value 'missing' is not counted.

Parameters:
att - the attribute
Returns:
the number of distinct values of a given attribute
 o numInstances
 public final int numInstances()
Returns the number of instances in the dataset.

Returns:
the number of instances in the dataset as an integer
 o randomize
 public final void randomize(Random random)
Shuffles the instances in the set so that they are ordered randomly.

Parameters:
random - a random number generator
 o readInstance
 public final boolean readInstance(Reader reader) throws IOException
Reads a single instance from the reader and appends it to the dataset. Automatically expands the dataset if it is not large enough to hold the instance. This method does not check for carriage return at the end of the line.

Parameters:
reader - the reader
Returns:
false if end of file has been reached
Throws: IOException
if the information is not read successfully
 o relationName
 public final String relationName()
Returns the relation's name.

Returns:
the relation's name as a string
 o renameAttribute
 public final void renameAttribute(int att,
                                   String name)
Renames an attribute. This change only affects this dataset.

Parameters:
att - the attribute's index
name - the new name
 o renameAttribute
 public final void renameAttribute(Attribute att,
                                   String name)
Renames an attribute. This change only affects this dataset.

Parameters:
att - the attribute
name - the new name
 o renameAttributeValue
 public final void renameAttributeValue(int att,
                                        int val,
                                        String name) throws Exception
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.

Parameters:
att - the attribute's index
val - the value's index
name - the new name
Throws: Exception
if renaming fails
 o renameAttributeValue
 public final void renameAttributeValue(Attribute att,
                                        String val,
                                        String name) throws Exception
Renames the value of a nominal (or string) attribute value. This change only affects this dataset.

Parameters:
att - the attribute
val - the value
name - the new name
Throws: Exception
if renaming fails
 o resample
 public final Instances resample(Random random)
Creates a new dataset of the same size using random sampling with replacement.

Parameters:
random - a random number generator
Returns:
the new dataset
 o resampleWithWeights
 public final Instances resampleWithWeights(Random random,
                                            double weights[]) throws Exception
Creates a new dataset of the same size using random sampling with replacement according to the given weight vector. The weights of the instances in the new dataset are set to one. The length of the weight vector has to be the same as the number of instances in the dataset, and all weights have to be positive.

Parameters:
random - a random number generator
weights - the weight vector
Returns:
the new dataset
Throws: Exception
if something goes wrong
 o setClass
 public final void setClass(Attribute att)
Sets the class attribute.

Parameters:
att - attribute to be the class
 o setClassIndex
 public final void setClassIndex(int classIndex) throws Exception
Sets the class index of the set. If the class index is negative there is assumed to be no class. (ie. it is undefined)

Parameters:
classIndex - the new class index
Throws: Exception
if the class index is too big
 o setRelationName
 public final void setRelationName(String newName)
Sets the relation's name.

Parameters:
newName - the new relation name.
 o sort
 public final void sort(int attIndex)
Sorts the instances based on an attribute. For numeric attributes, instances are sorted in ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.

Parameters:
attIndex - the attribute's index
 o sort
 public final void sort(Attribute att)
Sorts the instances based on an attribute. For numeric attributes, instances are sorted into ascending order. For nominal attributes, instances are sorted based on the attribute label ordering specified in the header. Instances with missing values for the attribute are placed at the end of the dataset.

Parameters:
att - the attribute
 o stratify
 public final void stratify(int numFolds) throws Exception
Stratifies a set of instances according to its class values if the class attribute is nominal (so that afterwards a stratified cross-validation can be performed).

Parameters:
numFolds - the number of folds in the cross-validation
Throws: Exception
if the class is not set
 o sumOfWeights
 public final double sumOfWeights()
Computes the sum of all the instances' weights.

Returns:
the sum of all the instances' weights as a double
 o testCV
 public Instances testCV(int numFolds,
                         int numFold) throws Exception
Creates the test set for one fold of a cross-validation on the dataset.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
Returns:
the test set as a set of weighted instances
Throws: Exception
if dataset can't be generated successfully
 o toString
 public final String toString()
Returns the dataset as a string in ARFF format. Strings are quoted if they contain whitespace characters, or if they are a question mark.

Returns:
the dataset in ARFF format as a string
Overrides:
toString in class Object
 o trainCV
 public Instances trainCV(int numFolds,
                          int numFold) throws Exception
Creates the training set for one fold of a cross-validation on the dataset.

Parameters:
numFolds - the number of folds in the cross-validation. Must be greater than 1.
numFold - 0 for the first fold, 1 for the second, ...
Returns:
the training set as a set of weighted instances
Throws: Exception
if dataset can't be generated successfully
 o variance
 public final double variance(int attIndex) throws Exception
Computes the variance for a numeric attribute.

Parameters:
attIndex - the numeric attribute
Returns:
the variance if the attribute is numeric
Throws: Exception
if the attribute is not numeric
 o variance
 public final double variance(Attribute att) throws Exception
Computes the variance for a numeric attribute.

Parameters:
att - the numeric attribute
Returns:
the variance if the attribute is numeric
Throws: Exception
if the attribute is not numeric
 o attributeStats
 public Instances. AttributeStats attributeStats(int index)
Calculates summary statistics on the values that appear in this set of instances for a specified attribute.

Parameters:
index - the index of the attribute to summarize.
Returns:
an AttributeStats object with it's fields calculated.
 o toSummaryString
 public String toSummaryString()
Generates a string summarizing the set of instances. Gives a breakdown for each attribute indicating the number of missing/discrete/unique values and other information.

Returns:
a string summarizing the dataset
 o mergeInstances
 public static Instances mergeInstances(Instances first,
                                        Instances second) throws Exception
Merges two sets of Instances together. The resulting set will have all the attributes of the first set plus all the attributes of the second set. The number of instances in both sets must be the same.

Parameters:
first - the first set of Instances
second - the second set of Instances
Returns:
the merged set of Instances
Throws: Exception
if an error occurs
 o test
 public static void test(String argv[])
Method for testing this class.

Parameters:
argv - should contain one element: the name of an ARFF file
 o main
 public static void main(String args[])
Main method for this class -- just prints a summary of a set of instances.

Parameters:
argv - should contain one element: the name of an ARFF file

All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home