All Packages Class Hierarchy This Package Previous Next Index WEKA's home
Class weka.filters.DiscretizeFilter
java.lang.Object
|
+----weka.filters.Filter
|
+----weka.filters.DiscretizeFilter
- public class DiscretizeFilter
- extends Filter
- implements OptionHandler, WeightedInstancesHandler
An instance filter that discretizes a range of numeric attributes in
the dataset into nominal attributes. Discretization can be either by
simple binning, or by Fayyad & Irani's MDL method (the default).
Valid filter-specific options are:
-B num
Specify the (maximum) number of bins to divide numeric attributes into.
(default class-based discretisation).
-O
Optimizes the number of bins using a leave-one-out estimate of the
entropy.
-R col1,col2-col4,...
Specify list of columns to Discretize. First
and last are valid indexes. (default none)
-V
Invert matching sense.
-D
Make binary nominal attributes.
-E
Use better encoding of split point for MDL.
-K
Use Kononeko's MDL criterion.
- Version:
- $Revision: 1.5 $
- Author:
- Len Trigg (trigg@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz) (Fayyad and Irani's method)
-
DiscretizeFilter()
- Constructor - initialises the filter
-
batchFinished()
- Signifies that this batch of input to the filter is finished.
-
getAttributeIndices()
- Gets the current range selection
-
getBins()
- Gets the number of bins numeric attributes will be divided into
-
getCutPoints(int)
- Gets the cut points for an attribute
-
getFindNumBins()
- Get the value of FindNumBins.
-
getInvertSelection()
- Gets whether the supplied columns are to be removed or kept
-
getMakeBinary()
- Gets whether binary attributes should be made for discretized ones.
-
getOptimzeBinning()
- Get if binning is to be optimized.
-
getOptions()
- Gets the current settings of the filter.
-
getUseBetterEncoding()
- Gets whether better encoding is to be used for MDL.
-
getUseKononenko()
- Gets whether Kononenko's MDL criterion is to be used.
-
getUseMDL()
- Gets whether MDL will be used as the discretisation method
-
input(Instance)
- Input an instance for filtering.
-
inputFormat(Instances)
- Sets the format of the input instances.
-
listOptions()
- Gets an enumeration describing the available options
-
main(String[])
- Main method for testing this class.
-
setAttributeIndices(String)
- Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
-
setAttributeIndicesArray(int[])
- Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
-
setBins(int)
- Sets the number of bins to divide each selected numeric attribute into
-
setFindNumBins(boolean)
- Set the value of FindNumBins.
-
setInvertSelection(boolean)
- Sets whether selected columns should be removed or kept.
-
setMakeBinary(boolean)
-
Sets whether binary attributes should be made for discretized ones.
-
setOptimizeBinning(boolean)
- Sets if binning is to be optimized.
-
setOptions(String[])
- Parses the options for this object.
-
setUseBetterEncoding(boolean)
-
Sets whether better encoding is to be used for MDL.
-
setUseKononenko(boolean)
-
Sets whether Kononenko's MDL criterion is to be used.
-
setUseMDL(boolean)
-
Sets whether MDL will be used as the discretisation method
DiscretizeFilter
public DiscretizeFilter()
- Constructor - initialises the filter
listOptions
public Enumeration listOptions()
- Gets an enumeration describing the available options
- Returns:
- an enumeration of all the available options
setOptions
public void setOptions(String options[]) throws Exception
- Parses the options for this object. Valid options are:
-B num
Specify the (maximum) number of equal-width bins to divide
numeric attributes into. (default class-based discretization).
-O
Optimizes the number of bins using a leave-one-out estimate of the
entropy.
-R col1,col2-col4,...
Specify list of columns to discretize. First
and last are valid indexes. (default none)
-V
Invert matching sense.
-D
Make binary nominal attributes.
-E
Use better encoding of split point for MDL.
-K
Use Kononeko's MDL criterion.
- Parameters:
- options - the list of options as an array of strings
- Throws: Exception
- if an option is not supported
getOptions
public String[] getOptions()
- Gets the current settings of the filter.
- Returns:
- an array of strings suitable for passing to setOptions
inputFormat
public boolean inputFormat(Instances instanceInfo) throws Exception
- Sets the format of the input instances.
- Parameters:
- instanceInfo - an Instances object containing the input instance
structure (any instances contained in the object are ignored - only the
structure is required).
- Returns:
- true if the outputFormat may be collected immediately
- Throws: Exception
- if the input format can't be set successfully
- Overrides:
- inputFormat in class Filter
input
public boolean input(Instance instance) throws Exception
- Input an instance for filtering. Ordinarily the instance is processed
and made available for output immediately. Some filters require all
instances be read before producing output.
- Parameters:
- instance - the input instance
- Returns:
- true if the filtered instance may now be
collected with output().
- Throws: Exception
- if the input instance was not of the correct
format or if there was a problem with the filtering.
- Overrides:
- input in class Filter
batchFinished
public boolean batchFinished() throws Exception
- Signifies that this batch of input to the filter is finished. If the
filter requires all instances prior to filtering, output() may now
be called to retrieve the filtered instances.
- Returns:
- true if there are instances pending output
- Throws: Exception
- if no input structure has been defined
- Overrides:
- batchFinished in class Filter
getFindNumBins
public boolean getFindNumBins()
- Get the value of FindNumBins.
- Returns:
- Value of FindNumBins.
setFindNumBins
public void setFindNumBins(boolean newFindNumBins)
- Set the value of FindNumBins.
- Parameters:
- newFindNumBins - Value to assign to FindNumBins.
getMakeBinary
public boolean getMakeBinary()
- Gets whether binary attributes should be made for discretized ones.
- Returns:
- true if attributes will be binarized
setMakeBinary
public void setMakeBinary(boolean makeBinary)
- Sets whether binary attributes should be made for discretized ones.
- Parameters:
- makeBinary - if binary attributes are to be made
getUseMDL
public boolean getUseMDL()
- Gets whether MDL will be used as the discretisation method
- Returns:
- true if so
setUseMDL
public void setUseMDL(boolean useMDL)
- Sets whether MDL will be used as the discretisation method
- Parameters:
- useMDL - true if MDL should be used
getUseKononenko
public boolean getUseKononenko()
- Gets whether Kononenko's MDL criterion is to be used.
- Returns:
- true if Kononenko's criterion will be used.
setUseKononenko
public void setUseKononenko(boolean useKon)
- Sets whether Kononenko's MDL criterion is to be used.
- Parameters:
- useKon - true if Kononenko's one is to be used
getUseBetterEncoding
public boolean getUseBetterEncoding()
- Gets whether better encoding is to be used for MDL.
- Returns:
- true if the better MDL encoding will be used
setUseBetterEncoding
public void setUseBetterEncoding(boolean useBetterEncoding)
- Sets whether better encoding is to be used for MDL.
- Parameters:
- useBetterEncoding - true if better encoding to be used.
getBins
public int getBins()
- Gets the number of bins numeric attributes will be divided into
- Returns:
- the number of bins.
setBins
public void setBins(int numBins)
- Sets the number of bins to divide each selected numeric attribute into
- Parameters:
- numBins - the number of bins
getInvertSelection
public boolean getInvertSelection()
- Gets whether the supplied columns are to be removed or kept
- Returns:
- true if the supplied columns will be kept
setInvertSelection
public void setInvertSelection(boolean invert)
- Sets whether selected columns should be removed or kept. If true the
selected columns are kept and unselected columns are deleted. If false
selected columns are deleted and unselected columns are kept.
- Parameters:
- invert - the new invert setting
getAttributeIndices
public String getAttributeIndices()
- Gets the current range selection
- Returns:
- a string containing a comma separated list of ranges
setAttributeIndices
public void setAttributeIndices(String rangeList) throws Exception
- Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
- Parameters:
- rangeList - a string representing the list of attributes. Since
the string will typically come from a user, attributes are indexed from
1.
eg: first-3,5,6-last
- Throws: Exception
- if an invalid range list is supplied
setAttributeIndicesArray
public void setAttributeIndicesArray(int attributes[]) throws Exception
- Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
- Parameters:
- attributes - an array containing indexes of attributes to Discretize.
Since the array will typically come from a program, attributes are indexed
from 0.
- Throws: Exception
- if an invalid set of ranges is supplied
getCutPoints
public double[] getCutPoints(int attributeIndex)
- Gets the cut points for an attribute
- Parameters:
- the - index (from 0) of the attribute to get the cut points of
- Returns:
- an array containing the cutpoints (or null if the
attribute requested isn't being Discretized
getOptimzeBinning
public boolean getOptimzeBinning()
- Get if binning is to be optimized.
- Returns:
- if binning is to be optimized
setOptimizeBinning
public void setOptimizeBinning(boolean bool)
- Sets if binning is to be optimized.
- Parameters:
- bool - set if binning
main
public static void main(String argv[])
- Main method for testing this class.
- Parameters:
- argv - should contain arguments to the filter: use -h for help
All Packages Class Hierarchy This Package Previous Next Index WEKA's home