Main Page   Class Hierarchy   Alphabetical List   Compound List   Compound Members  

prapi::FeatureSelector< T, I, C > Class Template Reference

#include <FeatureSelector.h>

Inheritance diagram for prapi::FeatureSelector< T, I, C >:

EventSource< OptimizationEvent > EventListener< EvolutionEvent > FitnessCalculator< char > Mutator< char > Object Object List of all members.

Detailed Description

template<class T = double, class I = std::string, class C = int>
class prapi::FeatureSelector< T, I, C >

FeatureSelector is a class for finding an optimal subset of features.

Typically, optimization samples with a great amount of features are created. In a subsequent step, 'useless' features are removed. The remaining features can then be used in classification. FeatureSelector uses a genetic algorithm in finding an optimal subset of patterns. It can also perform special operations needed for statistical features.

It is also possible to use the non-randomized version of a genetic algorithm for the optimization process. In the non-randomized case, beam search is utilized.

One can observe the optimization process by adding an event listener to a feature selector. After each search level (generation), all registered listeners are informed with an OptimizationEvent. In this point, one has the possibility of stopping the optimization with the stop() method. It is thus possible to impose optimization constraints other than iteration count and result threshold.

See also:
OptimizationEvent


Public Methods

 FeatureSelector (util::List< Sample< T, I, C > > &samples, int classCount, double resultThreshold, int generationCount=-1, int populationSize=10, bool accumulate=false)
 Create a new FeatureSelector that optimizes a feature set using the given samples.

virtual ~FeatureSelector ()
void eventOccured (util::EventSource< EvolutionEvent > *source, const EvolutionEvent &event)
 This method is called by a genetic algorithm each time a new generation is created.

virtual void calculateFitness (util::List< prapi::ga::Individual< char > > &population) throw (ClassificationException&,util::InvalidArgumentException&)
 Calculate the fitness of a population.

char mutate (char value)
 Mutate a single gene in an individual.

virtual double getClassificationResult (util::List< Sample< T, I, C > > &samples) throw (ClassificationException&,InvalidArgumentException&)
 Get the classification result for a set of samples.

util::List< util::List< int > > optimize () throw (ClassificationException&,util::InvalidArgumentException&)
 Optimize a feature set.

util::List< util::List< int > > optimize (util::List< prapi::ga::Individual< char > > &population) throw (ClassificationException&,util::InvalidArgumentException&)
 Optimize a feature set.

util::List< util::List< int > > beamOptimize () throw (ClassificationException&,util::InvalidArgumentException&)
 Optimize a feature set with beam search.

util::List< util::List< int > > beamOptimize (util::List< prapi::ga::Individual< char > > &population) throw (ClassificationException&,util::InvalidArgumentException&)
 Optimize a feature set with beam search.

util::List< Sample< T, I,
C > > 
createSampleSet (util::List< Sample< T, I, C > > &samples, util::List< int > &selectedFeatures)
 Create a new sample set that contains the samples in samples with only the indicated features collected into the feature vectors.

void setResultThreshold (double threshold)
 Set the result threshold.

double getResultThreshold ()
 Get the result threshold.

void setGenerationCount (int cnt)
 Set the maximum number of generations to produce.

int getGenerationCount ()
 Get the generation count.

void setClassCount (int cnt)
 Set the number of classes in the data.

int getClassCount ()
 Get the number of classes in the data.

void setPopulationSize (int size)
 Set the number of individuals in each generation.

int getPopulationSize ()
 Get the size of the population, i.e.

void setAccumulate (bool accumulate)
 Set the accumulation flag.

bool getAccumulate ()
 Get the accumulation flag.

void stop ()
 Stop the optimization.

void setLengthPenalty (double value)
 Set the length penalty.

double getLengthPenalty ()
 Get the length penalty.

double & lengthPenalty ()
 Set the length penalty.


Static Public Methods

bool isSaturated (const List< double > &values, double threshold)
 Check if the given sequence of number is 'saturated'.


Constructor & Destructor Documentation

template<class T = double, class I = std::string, class C = int>
prapi::FeatureSelector< T, I, C >::FeatureSelector util::List< Sample< T, I, C > > &    samples,
int    classCount,
double    resultThreshold,
int    generationCount = -1,
int    populationSize = 10,
bool    accumulate = false
[inline]
 

Create a new FeatureSelector that optimizes a feature set using the given samples.

In non-randomized search (beam search), generation count is read 'search depth' and population size is 'beam width'.

Parameters:
samples  a list of optimization samples
classCount  the number of different classes in the sample set
resultThreshold  the value over which classification result must rise before the evolution is stopped
generationCount  maximum number of generations to produce (-1 disables)
populationSize  the number of individuals in each generation
accumulate  if statistical features are used, it is sometimes desirable to be able to collect the values of disabled features to an additional distribution bin. If this flag is true, all disabled features are summed up and used as an additional feature.


Member Function Documentation

template<class T = double, class I = std::string, class C = int>
util::List<util::List<int> > prapi::FeatureSelector< T, I, C >::beamOptimize util::List< prapi::ga::Individual< char > > &    population throw (ClassificationException&,util::InvalidArgumentException&) [inline]
 

Optimize a feature set with beam search.

See also:
optimize()

template<class T = double, class I = std::string, class C = int>
util::List<util::List<int> > prapi::FeatureSelector< T, I, C >::beamOptimize   throw (ClassificationException&,util::InvalidArgumentException&) [inline]
 

Optimize a feature set with beam search.

See also:
optimize()

template<class T, class I, class C>
void prapi::FeatureSelector< T, I, C >::calculateFitness util::List< prapi::ga::Individual< char > > &    population throw (ClassificationException&,util::InvalidArgumentException&) [virtual]
 

Calculate the fitness of a population.

This method goes through all individuals in a population, generates a sample set for each individual (an individual represents the enabled features), and classifies the formed sample set. The fitness of an individual is the rate of correct classifications.

template<class T, class I, class C>
util::List< Sample< T, I, C > > prapi::FeatureSelector< T, I, C >::createSampleSet util::List< Sample< T, I, C > > &    samples,
util::List< int > &    selectedFeatures
 

Create a new sample set that contains the samples in samples with only the indicated features collected into the feature vectors.

If the accumulation flag is set, then an additional feature containing the sum of all disabled features is added to the feature vectors.

Parameters:
samples  the samples
selectedFeatures  the indices of enabled features
Returns:
a list of samples with the indicated features enabled

template<class T, class I, class C>
void prapi::FeatureSelector< T, I, C >::eventOccured util::EventSource< EvolutionEvent > *    source,
const EvolutionEvent &    event
[virtual]
 

This method is called by a genetic algorithm each time a new generation is created.

FeatureSelector keeps track of the current state of the optimization process by checking the results after each generation.

Implements EventListener< EvolutionEvent >.

template<class T, class I, class C>
double prapi::FeatureSelector< T, I, C >::getClassificationResult util::List< Sample< T, I, C > > &    samples throw (ClassificationException&,InvalidArgumentException&) [virtual]
 

Get the classification result for a set of samples.

The returned value is used as a fitness for an individual in a population. The default implementation uses a kNN classifier with k=3 and an Euclidean proximity measure. The classification result is obtained using a leave-one-out test. One may want to override this method to perform a different type of classification.

Parameters:
samples  the samples to be classified
Returns:
the fraction of correct classifications, [0,1]

template<class T = double, class I = std::string, class C = int>
int prapi::FeatureSelector< T, I, C >::getPopulationSize   [inline]
 

Get the size of the population, i.e.

the beam width.

template<class T, class I, class C>
bool prapi::FeatureSelector< T, I, C >::isSaturated const List< double > &    values,
double    threshold
[static]
 

Check if the given sequence of number is 'saturated'.

A sequence of number is saturated if it does not grow sufficiently. The purpose of this method is to provide an easy way to check when to stop an optimization process. If the obtained classification results do not get better fast enough, it may be wise to stop the process.

Parameters:
values  a sequence of obtained classification results. The values should be ordered so that values[0] referes to the oldest entry.
threshold  a threshold for deciding what is sufficient increase in classification accuracy. If the accuracy does not, on average, increase more than the indicated threshold on each round, the process is saturated.

template<class T = double, class I = std::string, class C = int>
double& prapi::FeatureSelector< T, I, C >::lengthPenalty   [inline]
 

Set the length penalty.

If the length penalty is set to a value other than zero, each feature in a feature vector is 'punished' with the given value.

template<class T = double, class I = std::string, class C = int>
util::List<util::List<int> > prapi::FeatureSelector< T, I, C >::optimize util::List< prapi::ga::Individual< char > > &    population throw (ClassificationException&,util::InvalidArgumentException&) [inline]
 

Optimize a feature set.

If you want to continue an interrupted optimization, an initial population may be given. In the population, each gene in an individual is either 0 (off) or -1 (on). The number of genes equals the number of possible features. For example, an individual having the genome (-1 0 0 -1) represents a set of two enabled features from a maximum number of four features. Optimize() changes the contents of population during the optimization process.

template<class T = double, class I = std::string, class C = int>
util::List<util::List<int> > prapi::FeatureSelector< T, I, C >::optimize   throw (ClassificationException&,util::InvalidArgumentException&) [inline]
 

Optimize a feature set.

A genetic algorithm is run either a predefined number of iterations or until a threshold for classification result is exceeded.

Returns:
the remaining feature sets in a list. The size of the list is equal to the size of the initial population. Each element in this list stores a variable number of indices to enabled features. Thus, result[0][5] references the sixth enabled feature in the first feature set. The returned list is sorted in descending order according to classification accuracy.
Exceptions:
ClassificationException  & if something goes kablooie.

template<class T = double, class I = std::string, class C = int>
void prapi::FeatureSelector< T, I, C >::setGenerationCount int    cnt [inline]
 

Set the maximum number of generations to produce.

For beam search, this value determines the number of selected features in the result set. Set to -1 to disable the generation (search level) limit.

template<class T = double, class I = std::string, class C = int>
void prapi::FeatureSelector< T, I, C >::setLengthPenalty double    value [inline]
 

Set the length penalty.

If the length penalty is set to a value other than zero, each feature in a feature vector is 'punished' with the given value.

template<class T = double, class I = std::string, class C = int>
void prapi::FeatureSelector< T, I, C >::setPopulationSize int    size [inline]
 

Set the number of individuals in each generation.

For beam search, this is the width of the beam.

template<class T = double, class I = std::string, class C = int>
void prapi::FeatureSelector< T, I, C >::stop void    [inline]
 

Stop the optimization.

The optimization will be finished as soon as possible, in most cases after the next generation (search) level has been generated.


The documentation for this class was generated from the following file:
Documentation generated on 20.12.2002 with Doxygen.
The documentation is copyrighted material.
Copyright © Topi Mäenpää 2002. All rights reserved.