API¶

This is the full API documentation of the MICO.

Conic optimization approach for feature selection¶

class mico.MutualInformationConicOptimization(method='JMI', k=5, n_features='auto', categorical=True, n_jobs=0, verbose=0, scale_data=True, num_bins=0, random_state=0, max_roundings=0)[source]¶

Conic optimization approach for feature selection.

This class implements a conic optimization based feature selection method with mutual information (MI) measure [Rc0ede6ed6b81-1]. This idea is to formulate the selection problem as a pure-binary quadratic optimization problem, which can be heuristically solved by an efficient randomization algorithm via semidefinite programming [Rc0ede6ed6b81-2]. Optimization software Colin [Rc0ede6ed6b81-6] is used for solving the underlying conic optimization problems.

Parameters:

method : string, optional (default=’JMI’)

A string that specifies the mutual information measure. Possible options are:

JMI : Joint Mutual Information [Rc0ede6ed6b81-3]
JMIM : Joint Mutual Information Maximisation [Rc0ede6ed6b81-4]
MRMR : Max-Relevance Min-Redundancy [Rc0ede6ed6b81-5]

k : int, optional (default=5)

An integer to specify number of samples for the kernel density estimation using KNN. A small integer between 3 and 10 is recommended. Note that this parameter is applicable only if num_bins is set to 0.

n_features : int or string, optional (default=’auto’)

A positive integer number that specifies the number of features to be selected. If ‘auto’ is used, then this number will be determined automatically.

categorical : boolean, optional (default=True)

If categorical is True, then the dependent variable is assumed to be a categorical class label. Otherwise, the dependent variable is assumed to be a continuous variable.

n_jobs : int, optional (default=1)

The number of threads allowed for the computations. If n_jobs \(<= 0\), then it will be determined internally.

verbose : int, optional (default=0)

An integer number to specify the verbosity level. Available options are:

0: Disable output.
1: Display summary results.
2: Displays all results.

scale_data : boolean, optional (default=True)

An boolean flag that specifies if the input data input shall be scaled first.

num_bins : int, optional (default=0)

An integer number to specify the number of bins allowed for data binning (bucketing). If \(0\), then data binning will be disabled, and KNN will be used for kernal density estimation.

random_state : int, optional (default=0)

An integer number that specifies the random seed number.

max_roundings : int, optional (default=0)

An positive integer to specify the number of iterations allowed for the rounding solution process in [Rc0ede6ed6b81-2]. If max_roundings is 0, then this number will be determined internally.

References

[Rc0ede6ed6b81-1]

T Naghibi, S Hoffmann and B Pfister, “A semidefinite programming based search strategy for feature selection with mutual information measure”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(8), pp. 1529–1541, 2015. [Pre-print]

[Rc0ede6ed6b81-2]

(1, 2) M Goemans and D Williamson, “Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming”, J. ACM, 42(6), pp. 1115–1145, 1995 [Pre-print]

[Rc0ede6ed6b81-3]

H Yang and J Moody, “Data Visualization and Feature Selection: New Algorithms for Nongaussian Data”, NIPS 1999. [Pre-print]

[Rc0ede6ed6b81-4]

M Bennasar, Y Hicks, abd R Setchi, “Feature selection using Joint Mutual Information Maximisation”, Expert Systems with Applications, 42(22), pp. 8520–8532, 2015 [pre-print]

[Rc0ede6ed6b81-5]

H Peng, F Long, and C Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1226–1238, 2005. [Pre-print]

[Rc0ede6ed6b81-6]

Colin: Conic-form Linear Optimizer (www.colinopt.org).

Examples

This implementation follows the scikit-learn API convention:

fit(X, y)
transform(X)
fit_transform(X, y)

Attributes:	n_features_ : int The number of selected features. feature_importances_ : array of shape n_features The feature importance scores of the selected features.

fit(self, X, y)[source]¶

Fits the feature selection with conic optimization approach.

Parameters:	X : array-like, shape = [n_samples, n_features] The training input samples. y : array-like, shape = [n_samples] The target values.

fit_transform(self, X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values.
Returns:	X_new : numpy array of shape [n_samples, n_features_new] Transformed array.

get_params(self, deep=True)¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

get_support(self, indices=False)¶

Get a mask, or integer index, of the features selected

Parameters:	indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns:	support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform(self, X)¶

Reverse the transformation operation

Parameters:	X : array of shape [n_samples, n_selected_features] The input samples.
Returns:	X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by transform.

set_params(self, **params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

transform(self, X)¶

Reduce X to the selected features.

Parameters:	X : array of shape [n_samples, n_features] The input samples.
Returns:	X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

Backward elimination approach for feature selection¶

class mico.MutualInformationBackwardElimination(method='JMI', k=5, n_features='auto', categorical=True, n_jobs=0, verbose=0, scale_data=True, num_bins=0)[source]¶

Backward elimination approach for feature selection.

This class implements a backward selection approach for feature selection with mutual information (MI) measure.

Parameters:

method : string, optional (default=’JMI’)

A string that specifies the mutual information measure. Possible options are:

JMI : Joint Mutual Information [Rd197fa8a83af-1]
JMIM : Joint Mutual Information Maximisation [Rd197fa8a83af-2]
MRMR : Max-Relevance Min-Redundancy [Rd197fa8a83af-3]

k : int, optional (default=5)

An integer to specify number of samples for the kernel density estimation using KNN. A small integer between 3 and 10 is recommended. Note that this parameter is applicable only if num_bins is set to 0.

n_features : int or string, optional (default=’auto’)

A positive integer number that specifies the number of features to be selected. If ‘auto’ is used, then this number will be determined automatically.

categorical : boolean, optional (default=True)

If categorical is True, then the dependent variable is assumed to be a categorical class label. Otherwise, the dependent variable is assumed to be a continuous variable.

n_jobs : int, optional (default=1)

The number of threads allowed for the computations. If n_jobs \(<= 0\), then it will be determined internally.

verbose : int, optional (default=0)

An integer number to specify the verbosity level. Available options are:

0: Disable output.
1: Display summary results.
2: Displays all results.

scale_data : boolean, optional (default=True)

An boolean flag that specifies if the input data input shall be scaled first.

num_bins : int, optional (default=0)

An integer number to specify the number of bins allowed for data binning (bucketing). If \(0\), then data binning will be disabled, and KNN will be used for kernal density estimation.

References

[Rd197fa8a83af-1]

H Yang and J Moody, “Data Visualization and Feature Selection: New Algorithms for Nongaussian Data”, NIPS 1999. [Pre-print]

[Rd197fa8a83af-2]

M Bennasar, Y Hicks, abd R Setchi, “Feature selection using Joint Mutual Information Maximisation”, Expert Systems with Applications, 42(22), pp. 8520–8532, 2015 [pre-print]

[Rd197fa8a83af-3]

H Peng, F Long, and C Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1226–1238, 2005. [Pre-print]

Examples

This implementation follows the scikit-learn API convention:

fit(X, y)

transform(X)

fit_transform(X, y)

Attributes:

n_features_ : int: The number of selected features.
feature_importances_ : array of shape n_features: The feature importance scores of the selected features.
ranking_ : array of shape n_features: The feature ranking of the selected features, with the first being the first feature selected with largest marginal MI with y, followed by the others with decreasing MI.
mi_ : array of shape n_features: The MI measure of the selected features. Usually this a monotone decreasing array of numbers converging to 0. One can use this to estimate the number of features to select. In fact this is what n_features=’auto’ tries to do heuristically.

fit(self, X, y)[source]¶

Fits the MI_FS feature selection with the chosen MI_FS method.

Parameters:	X : array-like, shape = [n_samples, n_features] The training input samples. y : array-like, shape = [n_samples] The target values.

fit_transform(self, X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values.
Returns:	X_new : numpy array of shape [n_samples, n_features_new] Transformed array.

get_params(self, deep=True)¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

get_support(self, indices=False)¶

Get a mask, or integer index, of the features selected

Parameters:	indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns:	support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform(self, X)¶

Reverse the transformation operation

Parameters:	X : array of shape [n_samples, n_selected_features] The input samples.
Returns:	X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by transform.

set_params(self, **params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

transform(self, X)¶

Reduce X to the selected features.

Parameters:	X : array of shape [n_samples, n_features] The input samples.
Returns:	X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.

Forward selection approach for feature selection¶

class mico.MutualInformationForwardSelection(method='JMI', k=5, n_features='auto', categorical=True, n_jobs=0, verbose=0, scale_data=True, num_bins=0, early_stop_steps=10)[source]¶

Forward selection approach for feature selection.

This class implements a forward selection approach for feature selection with mutual information (MI) measure.

Parameters:

method : string, optional (default=’JMI’)

A string that specifies the mutual information measure. Possible options are:

JMI : Joint Mutual Information [Rd68def899164-1]
JMIM : Joint Mutual Information Maximisation [Rd68def899164-2]
MRMR : Max-Relevance Min-Redundancy [Rd68def899164-3]

k : int, optional (default=5)

An integer to specify number of samples for the kernel density estimation using KNN. A small integer between 3 and 10 is recommended. Note that this parameter is applicable only if num_bins is set to 0.

n_features : int or string, optional (default=’auto’)

A positive integer number that specifies the number of features to be selected. If ‘auto’ is used, then this number will be determined automatically.

categorical : boolean, optional (default=True)

If categorical is True, then the dependent variable is assumed to be a categorical class label. Otherwise, the dependent variable is assumed to be a continuous variable.

n_jobs : int, optional (default=1)

The number of threads allowed for the computations. If n_jobs \(<= 0\), then it will be determined internally.

verbose : int, optional (default=0)

An integer number to specify the verbosity level. Available options are:

0: Disable output.
1: Display summary results.
2: Displays all results.

scale_data : boolean, optional (default=True)

An boolean flag that specifies if the input data input shall be scaled first.

num_bins : int, optional (default=0)

An integer number to specify the number of bins allowed for data binning (bucketing). If \(0\), then data binning will be disabled, and KNN will be used for kernal density estimation.

early_stop_steps : int, optional (default=10)

An positive integer to specify the iteration limit allowed without improvement on the MI measure.

References

[Rd68def899164-1]

H Yang and J Moody, “Data Visualization and Feature Selection: New Algorithms for Nongaussian Data”, NIPS 1999. [Pre-print]

[Rd68def899164-2]

M Bennasar, Y Hicks, abd R Setchi, “Feature selection using Joint Mutual Information Maximisation”, Expert Systems with Applications, 42(22), pp. 8520–8532, 2015 [pre-print]

[Rd68def899164-3]

H Peng, F Long, and C Ding, “Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), pp. 1226–1238, 2005. [Pre-print]

Examples

This implementation follows the scikit-learn API convention:

fit(X, y)

transform(X)

fit_transform(X, y)

Attributes:

n_features_ : int: The number of selected features.
feature_importances_ : array of shape n_features: The feature importance scores of the selected features.
ranking_ : array of shape n_features: The feature ranking of the selected features, with the first being the first feature selected with largest marginal MI with y, followed by the others with decreasing MI.
mi_ : array of shape n_features: The MI measure of the selected features. Usually this a monotone decreasing array of numbers converging to 0. One can use this to estimate the number of features to select. In fact this is what n_features=’auto’ tries to do heuristically.

fit(self, X, y)[source]¶

Fits the MI_FS feature selection with the chosen MI_FS method.

Parameters:	X : array-like, shape = [n_samples, n_features] The training input samples. y : array-like, shape = [n_samples] The target values.

fit_transform(self, X, y=None, **fit_params)¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values.
Returns:	X_new : numpy array of shape [n_samples, n_features_new] Transformed array.

get_params(self, deep=True)¶

Get parameters for this estimator.

Parameters:	deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params : mapping of string to any Parameter names mapped to their values.

get_support(self, indices=False)¶

Get a mask, or integer index, of the features selected

Parameters:	indices : boolean (default False) If True, the return value will be an array of integers, rather than a boolean mask.
Returns:	support : array An index that selects the retained features from a feature vector. If indices is False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention. If indices is True, this is an integer array of shape [# output features] whose values are indices into the input feature vector.

inverse_transform(self, X)¶

Reverse the transformation operation

Parameters:	X : array of shape [n_samples, n_selected_features] The input samples.
Returns:	X_r : array of shape [n_samples, n_original_features] X with columns of zeros inserted where features would have been removed by transform.

set_params(self, **params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:	self

transform(self, X)¶

Reduce X to the selected features.

Parameters:	X : array of shape [n_samples, n_features] The input samples.
Returns:	X_r : array of shape [n_samples, n_selected_features] The input samples with only the selected features.