Feature Selection methods and advantages

“”

1. Information Gain (IG)

Information Gain measures the reduction in entropy when splitting a dataset based on a feature. It helps to select features by evaluating how much information a feature provides about the target variable.

The formula for Information Gain is:

$$ IG(Y, X) = H(Y) - H(Y | X) $$

Where $ H(Y) $ is the entropy of the target variable:

$$ H(Y) = -\sum_{i=1}^{n} p(y_i) \log_2 p(y_i) $$

$ H(Y | X) $ is the conditional entropy of the target variable given the feature.

2. Chi-Square Test ($\chi^2$)

The Chi-Square Test determines the relationship between categorical features and the target. It compares observed and expected frequencies to determine if the feature significantly impacts the target variable.

The formula for the Chi-square statistic is:

$$ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} $$

Where $ O_i $ is the observed frequency and $ E_i $ is the expected frequency.

3. Fisher’s Score

Fisher’s Score ranks features based on their ability to discriminate between different classes. Features with higher Fisher’s Scores are more useful for classification tasks.

The formula for Fisher’s Score is:

$$ F_j = \frac{(\mu_1 - \mu_2)^2}{\sigma_1^2 + \sigma_2^2} $$

Where $ \mu_1 $ and $ \mu_2 $ are the means of the feature values for the two classes, and $ \sigma_1^2 $ and $ \sigma_2^2 $ are the variances for the two classes.

4. Variance Threshold

The Variance Threshold method removes features with variance below a specified threshold. Features with low variance do not contribute much to distinguishing between samples.

The formula for the variance of a feature is:

$$ \text{Var}(X) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2 $$

Where $ x_i $ is the feature value for the $ i^{th} $ sample, and $ \bar{x} $ is the mean of the feature values.

Feature Selection Method	Advantages	Disadvantages
Filter Methods	Computational Efficiency: Faster and less resource-intensive Model Agnosticity: Independent of specific machine learning algorithms Suitable for high-dimensional data	Potential for Suboptimality: May not identify the best feature subset Limited Consideration of Interactions: Overlooks complex relationships between features
Wrapper Methods	Higher Predictive Accuracy: Optimizes for the chosen model Interaction Awareness: Considers feature interactions to maximize performance	Computational Complexity: Expensive and slow for large datasets Risk of Overfitting: Prone to overfitting, especially with small datasets
Embedded Methods	Balance of Efficiency and Accuracy: Incorporated into model training for better results Reduced Overfitting: Regularizes the feature selection process	Model Specificity: Often designed for specific algorithms Potential for Complexity: Can still be computationally expensive for complex models

Thinktechway

Feature Selection methods and advantages

Feature Selection Methods: Advantages and Disadvantages

Feature Selection Techniques: Mathematical Background

1. Filter Methods

2. Wrapper Methods

3. Embedded Methods

4. Dimensionality Reduction Techniques

5. Heuristic and Evolutionary Methods

1. Information Gain (IG)

2. Chi-Square Test (\(\chi^2\))

3. Fisher’s Score

4. Variance Threshold

Feature Selection Plots in Machine Learning

1. Correlation Heatmap

here is the code for Scatter plot Matrix

2. Scatter Plot Matrix (Pair Plot)

3. Box Plot

here is the code for box plot

4. Feature Importance Plot

here is the code for feature important plot

5. Variance Threshold Plot

here is the code for varience threehold plot

6. PCA (Explained Variance Plot)

here is the code for varience threehold plot

Next article will delve deeper into feature selection

Leave a Comment Cancel Reply

Thinktechway

Useful Link

Community

Community

Probably growing , with high impact : The content of this blog is not biased to any products or services rather an inflect of scientific fact.

©Copyright 2022. Powered by WPDeveloper