All Packages Class Hierarchy This Package Previous Next Index WEKA's home
Class weka.core.Instances
java.lang.Object
|
+----weka.core.Instances
- public class Instances
- extends Object
- implements Serializable
Class for handling an ordered set of weighted instances.
Typical usage (code from the main() method of this class):
...
// Read all the instances in the file
reader = new FileReader(filename);
instances = new Instances(reader);
// Make the last attribute be the class
instances.setClassIndex(instances.numAttributes() - 1);
// Print header and instances.
System.out.println("\nDataset:\n");
System.out.println(instances);
...
All methods that change a set of instances are safe, ie. a change
of a set of instances does not affect any other sets of
instances. All methods that change a datasets's attribute
information clone the dataset before it is changed.
- Version:
- $Revision: 1.13 $
- Author:
- Eibe Frank (eibe@cs.waikato.ac.nz), Len Trigg (trigg@cs.waikato.ac.nz)
-
Instances(Instances)
- Constructor copying all instances and references to
the header information from the given set of instances.
-
Instances(Instances, int)
- Constructor creating an empty set of instances.
-
Instances(Instances, int, int)
- Creates a new set of instances by copying a
subset of another set.
-
Instances(Reader)
- Reads an ARFF file from a reader, and assigns a weight of
one to each instance.
-
Instances(Reader, int)
- Reads the header of an ARFF file from a reader and
reserves space for the given number of instances.
-
Instances(String, FastVector, int)
- Creates an empty set of instances.
-
add(Instance)
- Adds one instance to the end of the set.
-
attribute(int)
- Returns an attribute.
-
attribute(String)
- Returns an attribute given its name.
-
attributeStats(int)
- Calculates summary statistics on the values that appear in this
set of instances for a specified attribute.
-
checkForStringAttributes()
- Checks for string attributes in the dataset
-
checkInstance(Instance)
- Checks if the given instance is compatible
with this dataset.
-
classAttribute()
- Returns the class attribute.
-
classIndex()
- Returns the class attribute's index.
-
compactify()
- Compactifies the set of instances.
-
delete(int)
- Removes an instance at the given position from the set.
-
deleteAttributeAt(int)
- Deletes an attribute at the given position
(0 to numAttributes() - 1).
-
deleteStringAttributes()
- Deletes all string attributes in the dataset.
-
deleteWithMissing(Attribute)
- Removes all instances with missing values for a particular
attribute from the dataset.
-
deleteWithMissing(int)
- Removes all instances with missing values for a particular
attribute from the dataset.
-
deleteWithMissingClass()
- Removes all instances with a missing class value
from the dataset.
-
enumerateAttributes()
- Returns an enumeration of all the attributes.
-
enumerateInstances()
- Returns an enumeration of all instances in the dataset.
-
equalHeaders(Instances)
- Checks if two headers are equivalent.
-
firstInstance()
- Returns the first instance in the set.
-
insertAttributeAt(Attribute, int)
- Inserts an attribute at the given position (0 to
numAttributes()) and sets all values to be missing.
-
instance(int)
- Returns the instance at the given position.
-
lastInstance()
- Returns the last instance in the set.
-
main(String[])
- Main method for this class -- just prints a summary of a set
of instances.
-
meanOrMode(Attribute)
- Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value.
-
meanOrMode(int)
- Returns the mean (mode) for a numeric (nominal) attribute as
a floating-point value.
-
mergeInstances(Instances, Instances)
- Merges two sets of Instances together.
-
numAttributes()
- Returns the number of attributes.
-
numClasses()
- Returns the number of class labels.
-
numDistinctValues(Attribute)
- Returns the number of distinct values of a given attribute.
-
numDistinctValues(int)
- Returns the number of distinct values of a given attribute.
-
numInstances()
- Returns the number of instances in the dataset.
-
randomize(Random)
- Shuffles the instances in the set so that they are ordered
randomly.
-
readInstance(Reader)
- Reads a single instance from the reader and appends it
to the dataset.
-
relationName()
- Returns the relation's name.
-
renameAttribute(Attribute, String)
- Renames an attribute.
-
renameAttribute(int, String)
- Renames an attribute.
-
renameAttributeValue(Attribute, String, String)
- Renames the value of a nominal (or string) attribute value.
-
renameAttributeValue(int, int, String)
- Renames the value of a nominal (or string) attribute value.
-
resample(Random)
- Creates a new dataset of the same size using random sampling
with replacement.
-
resampleWithWeights(Random, double[])
- Creates a new dataset of the same size using random sampling
with replacement according to the given weight vector.
-
setClass(Attribute)
-
Sets the class attribute.
-
setClassIndex(int)
-
Sets the class index of the set.
-
setRelationName(String)
- Sets the relation's name.
-
sort(Attribute)
- Sorts the instances based on an attribute.
-
sort(int)
- Sorts the instances based on an attribute.
-
stratify(int)
- Stratifies a set of instances according to its class values
if the class attribute is nominal (so that afterwards a
stratified cross-validation can be performed).
-
sumOfWeights()
- Computes the sum of all the instances' weights.
-
test(String[])
- Method for testing this class.
-
testCV(int, int)
- Creates the test set for one fold of a cross-validation on
the dataset.
-
toString()
- Returns the dataset as a string in ARFF format.
-
toSummaryString()
- Generates a string summarizing the set of instances.
-
trainCV(int, int)
- Creates the training set for one fold of a cross-validation
on the dataset.
-
variance(Attribute)
- Computes the variance for a numeric attribute.
-
variance(int)
- Computes the variance for a numeric attribute.
Instances
public Instances(Reader reader) throws Exception
- Reads an ARFF file from a reader, and assigns a weight of
one to each instance. Lets the index of the class
attribute be undefined (negative).
- Parameters:
- reader - the reader
- Throws: Exception
- if the ARFF file is not read
successfully
Instances
public Instances(Reader reader,
int capacity) throws Exception
- Reads the header of an ARFF file from a reader and
reserves space for the given number of instances. Lets
the class index be undefined (negative).
- Parameters:
- reader - the reader
- capacity - the capacity
- Throws: Exception
- if the header is not read successfully
or the capacity is not positive or zero
Instances
public Instances(Instances dataset)
- Constructor copying all instances and references to
the header information from the given set of instances.
- Parameters:
- instances - the set to be copied
Instances
public Instances(Instances dataset,
int capacity)
- Constructor creating an empty set of instances. Copies references
to the header information from the given set of instances. Sets
the capacity of the set of instances to 0 if its negative.
- Parameters:
- instances - the instances from which the header
information is to be taken
- capacity - the capacity of the new dataset
Instances
public Instances(Instances source,
int first,
int toCopy) throws Exception
- Creates a new set of instances by copying a
subset of another set.
- Parameters:
- source - the set of instances from which a subset
is to be created
- first - the index of the first instance to be copied
- toCopy - the number of instances to be copied
- Throws: Exception
- if first and toCopy are out of range
Instances
public Instances(String name,
FastVector attInfo,
int capacity)
- Creates an empty set of instances. Uses the given
attribute information. Sets the capacity of the set of
instances to 0 if its negative. Given attribute information
must not be changed after this constructor has been used.
- Parameters:
- name - the name of the relation
- attInfo - the attribute information
- capacity - the capacity of the set
add
public final void add(Instance instance)
- Adds one instance to the end of the set.
Shallow copies instance before it is added. Increases the
size of the dataset if it is not large enough. Does not
check if the instance is compatible with the dataset.
- Parameters:
- instance - the instance to be added
attribute
public final Attribute attribute(int index)
- Returns an attribute.
- Parameters:
- index - the attribute's index
- Returns:
- the attribute at the given position
attribute
public final Attribute attribute(String name)
- Returns an attribute given its name. If there is more than
one attribute with the same name, it returns the first one.
Returns null if the attribute can't be found.
- Parameters:
- name - the attribute's name
- Returns:
- the attribute with the given name, null if the
attribute can't be found
checkForStringAttributes
public boolean checkForStringAttributes()
- Checks for string attributes in the dataset
- Returns:
- true if string attributes are present, false otherwise
checkInstance
public final boolean checkInstance(Instance instance)
- Checks if the given instance is compatible
with this dataset. Only looks at the size of
the instance and the ranges of the values for
nominal and string attributes.
- Returns:
- true if the instance is compatible with the dataset
classAttribute
public final Attribute classAttribute() throws Exception
- Returns the class attribute.
- Returns:
- the class attribute
- Throws: Exception
- if the class is not set
classIndex
public final int classIndex()
- Returns the class attribute's index. Returns negative number
if it's undefined.
- Returns:
- the class index as an integer
compactify
public final void compactify()
- Compactifies the set of instances. Decreases the capacity of
the set so that it matches the number of instances in the set.
delete
public final void delete(int index)
- Removes an instance at the given position from the set.
- Parameters:
- index - the instance's position
deleteAttributeAt
public void deleteAttributeAt(int position) throws Exception
- Deletes an attribute at the given position
(0 to numAttributes() - 1). A deep copy of the attribute
information is performed before the attribute is deleted.
- Parameters:
- pos - the attribute's position
- Throws: Exception
- if the given index is out of range or the
class attribute is being deleted
deleteStringAttributes
public void deleteStringAttributes() throws Exception
- Deletes all string attributes in the dataset. A deep copy of the attribute
information is performed before an attribute is deleted.
- Throws: Exception
- if string attribute couldn't be
successfully deleted.
deleteWithMissing
public final void deleteWithMissing(int attIndex)
- Removes all instances with missing values for a particular
attribute from the dataset.
- Parameters:
- attIndex - the attribute's index
deleteWithMissing
public final void deleteWithMissing(Attribute att)
- Removes all instances with missing values for a particular
attribute from the dataset.
- Parameters:
- att - the attribute
deleteWithMissingClass
public final void deleteWithMissingClass() throws Exception
- Removes all instances with a missing class value
from the dataset.
- Throws: Exception
- if class is not set
enumerateAttributes
public Enumeration enumerateAttributes()
- Returns an enumeration of all the attributes.
- Returns:
- enumeration of all the attributes.
enumerateInstances
public final Enumeration enumerateInstances()
- Returns an enumeration of all instances in the dataset.
- Returns:
- enumeration of all instances in the dataset
equalHeaders
public final boolean equalHeaders(Instances dataset)
- Checks if two headers are equivalent.
- Parameters:
- dataset - another dataset
- Returns:
- true if the header of the given dataset is equivalent
to this header
firstInstance
public final Instance firstInstance()
- Returns the first instance in the set.
- Returns:
- the first instance in the set
insertAttributeAt
public void insertAttributeAt(Attribute att,
int position) throws Exception
- Inserts an attribute at the given position (0 to
numAttributes()) and sets all values to be missing.
Shallow copies the attribute before it is inserted, and performs
a deep copy of the existing attribute information.
- Parameters:
- att - the attribute to be inserted
- pos - the attribute's position
- Throws: Exception
- if the given index is out of range
instance
public final Instance instance(int index)
- Returns the instance at the given position.
- Parameters:
- index - the instance's index
- Returns:
- the instance at the given position
lastInstance
public final Instance lastInstance()
- Returns the last instance in the set.
- Returns:
- the last instance in the set
meanOrMode
public final double meanOrMode(int attIndex)
- Returns the mean (mode) for a numeric (nominal) attribute as
a floating-point value. Returns 0 if the attribute is neither nominal nor
numeric. If all values are missing it returns zero.
- Parameters:
- attIndex - the attribute's index
- Returns:
- the mean or the mode
meanOrMode
public final double meanOrMode(Attribute att)
- Returns the mean (mode) for a numeric (nominal) attribute as a
floating-point value. Returns 0 if the attribute is neither
nominal nor numeric. If all values are missing it returns zero.
- Parameters:
- att - the attribute
- Returns:
- the mean or the mode
numAttributes
public final int numAttributes()
- Returns the number of attributes.
- Returns:
- the number of attributes as an integer
numClasses
public final int numClasses() throws Exception
- Returns the number of class labels.
- Returns:
- the number of class labels as an integer if the class
attribute is nominal, 1 otherwise.
- Throws: Exception
- if the class is not set
numDistinctValues
public final int numDistinctValues(int attIndex)
- Returns the number of distinct values of a given attribute.
Returns the number of instances if the attribute is a
string attribute. The value 'missing' is not counted.
- Parameters:
- attIndex - the attribute
- Returns:
- the number of distinct values of a given attribute
numDistinctValues
public final int numDistinctValues(Attribute att)
- Returns the number of distinct values of a given attribute.
Returns the number of instances if the attribute is a
string attribute. The value 'missing' is not counted.
- Parameters:
- att - the attribute
- Returns:
- the number of distinct values of a given attribute
numInstances
public final int numInstances()
- Returns the number of instances in the dataset.
- Returns:
- the number of instances in the dataset as an integer
randomize
public final void randomize(Random random)
- Shuffles the instances in the set so that they are ordered
randomly.
- Parameters:
- random - a random number generator
readInstance
public final boolean readInstance(Reader reader) throws IOException
- Reads a single instance from the reader and appends it
to the dataset. Automatically expands the dataset if it
is not large enough to hold the instance. This method does
not check for carriage return at the end of the line.
- Parameters:
- reader - the reader
- Returns:
- false if end of file has been reached
- Throws: IOException
- if the information is not read
successfully
relationName
public final String relationName()
- Returns the relation's name.
- Returns:
- the relation's name as a string
renameAttribute
public final void renameAttribute(int att,
String name)
- Renames an attribute. This change only affects this
dataset.
- Parameters:
- att - the attribute's index
- name - the new name
renameAttribute
public final void renameAttribute(Attribute att,
String name)
- Renames an attribute. This change only affects this
dataset.
- Parameters:
- att - the attribute
- name - the new name
renameAttributeValue
public final void renameAttributeValue(int att,
int val,
String name) throws Exception
- Renames the value of a nominal (or string) attribute value. This
change only affects this dataset.
- Parameters:
- att - the attribute's index
- val - the value's index
- name - the new name
- Throws: Exception
- if renaming fails
renameAttributeValue
public final void renameAttributeValue(Attribute att,
String val,
String name) throws Exception
- Renames the value of a nominal (or string) attribute value. This
change only affects this dataset.
- Parameters:
- att - the attribute
- val - the value
- name - the new name
- Throws: Exception
- if renaming fails
resample
public final Instances resample(Random random)
- Creates a new dataset of the same size using random sampling
with replacement.
- Parameters:
- random - a random number generator
- Returns:
- the new dataset
resampleWithWeights
public final Instances resampleWithWeights(Random random,
double weights[]) throws Exception
- Creates a new dataset of the same size using random sampling
with replacement according to the given weight vector. The
weights of the instances in the new dataset are set to one.
The length of the weight vector has to be the same as the
number of instances in the dataset, and all weights have to
be positive.
- Parameters:
- random - a random number generator
- weights - the weight vector
- Returns:
- the new dataset
- Throws: Exception
- if something goes wrong
setClass
public final void setClass(Attribute att)
- Sets the class attribute.
- Parameters:
- att - attribute to be the class
setClassIndex
public final void setClassIndex(int classIndex) throws Exception
- Sets the class index of the set.
If the class index is negative there is assumed to be no class.
(ie. it is undefined)
- Parameters:
- classIndex - the new class index
- Throws: Exception
- if the class index is too big
setRelationName
public final void setRelationName(String newName)
- Sets the relation's name.
- Parameters:
- newName - the new relation name.
sort
public final void sort(int attIndex)
- Sorts the instances based on an attribute. For numeric attributes,
instances are sorted in ascending order. For nominal attributes,
instances are sorted based on the attribute label ordering
specified in the header. Instances with missing values for the
attribute are placed at the end of the dataset.
- Parameters:
- attIndex - the attribute's index
sort
public final void sort(Attribute att)
- Sorts the instances based on an attribute. For numeric attributes,
instances are sorted into ascending order. For nominal attributes,
instances are sorted based on the attribute label ordering
specified in the header. Instances with missing values for the
attribute are placed at the end of the dataset.
- Parameters:
- att - the attribute
stratify
public final void stratify(int numFolds) throws Exception
- Stratifies a set of instances according to its class values
if the class attribute is nominal (so that afterwards a
stratified cross-validation can be performed).
- Parameters:
- numFolds - the number of folds in the cross-validation
- Throws: Exception
- if the class is not set
sumOfWeights
public final double sumOfWeights()
- Computes the sum of all the instances' weights.
- Returns:
- the sum of all the instances' weights as a double
testCV
public Instances testCV(int numFolds,
int numFold) throws Exception
- Creates the test set for one fold of a cross-validation on
the dataset.
- Parameters:
- numFolds - the number of folds in the cross-validation. Must
be greater than 1.
- numFold - 0 for the first fold, 1 for the second, ...
- Returns:
- the test set as a set of weighted instances
- Throws: Exception
- if dataset can't be generated
successfully
toString
public final String toString()
- Returns the dataset as a string in ARFF format. Strings
are quoted if they contain whitespace characters, or if they
are a question mark.
- Returns:
- the dataset in ARFF format as a string
- Overrides:
- toString in class Object
trainCV
public Instances trainCV(int numFolds,
int numFold) throws Exception
- Creates the training set for one fold of a cross-validation
on the dataset.
- Parameters:
- numFolds - the number of folds in the cross-validation. Must
be greater than 1.
- numFold - 0 for the first fold, 1 for the second, ...
- Returns:
- the training set as a set of weighted
instances
- Throws: Exception
- if dataset can't be generated
successfully
variance
public final double variance(int attIndex) throws Exception
- Computes the variance for a numeric attribute.
- Parameters:
- attIndex - the numeric attribute
- Returns:
- the variance if the attribute is numeric
- Throws: Exception
- if the attribute is not numeric
variance
public final double variance(Attribute att) throws Exception
- Computes the variance for a numeric attribute.
- Parameters:
- att - the numeric attribute
- Returns:
- the variance if the attribute is numeric
- Throws: Exception
- if the attribute is not numeric
attributeStats
public Instances. AttributeStats attributeStats(int index)
- Calculates summary statistics on the values that appear in this
set of instances for a specified attribute.
- Parameters:
- index - the index of the attribute to summarize.
- Returns:
- an AttributeStats object with it's fields calculated.
toSummaryString
public String toSummaryString()
- Generates a string summarizing the set of instances. Gives a breakdown
for each attribute indicating the number of missing/discrete/unique
values and other information.
- Returns:
- a string summarizing the dataset
mergeInstances
public static Instances mergeInstances(Instances first,
Instances second) throws Exception
- Merges two sets of Instances together. The resulting set will have
all the attributes of the first set plus all the attributes of the
second set. The number of instances in both sets must be the same.
- Parameters:
- first - the first set of Instances
- second - the second set of Instances
- Returns:
- the merged set of Instances
- Throws: Exception
- if an error occurs
test
public static void test(String argv[])
- Method for testing this class.
- Parameters:
- argv - should contain one element: the name of an ARFF file
main
public static void main(String args[])
- Main method for this class -- just prints a summary of a set
of instances.
- Parameters:
- argv - should contain one element: the name of an ARFF file
All Packages Class Hierarchy This Package Previous Next Index WEKA's home