Process flow - specificFeatureSelection
Specific feature selection (specificFeatureSelection) are the most fine tuned and advanced methods for selecting covariates. This also means that they require more computing power. It is thus advisable to reduce the number of covariates entering this step in the process flow.
There are three (3) specfic feature selection methods aviable, but only one can be applied in each model formulation:
- Univariate selection (univariateSelection),
- Permutation selector (permutationSelector), and.
- Random Feature Elemination (RFE)
If a specific feature selection is applied it is the last step before the regression modeling.
|____SpectralData
| |____filter
| | |____singlefilter
| | |____multiFilter
| |____dataSetSplit
| | |____spectralInfoEnhancement
| | | |____scatterCorrection
| | | |____standardisation
| | | |____derivatives
| | | |____decompose
| | |____generalFeatureSelection
| | | |____varianceThreshold
| | |____targetFeatureExtract
| | | |____removeOutliers
| | | |____regressorExtract
| | | | |____specificFeatureAgglomeration
| | | | | |____wardClustering
| | | | |____specificFeatureSelection
| | | | | |____univariateSelection
| | | | | |____permutationSelector
| | | | | |____RFE
Introduction
Univaraite Feature Selection
Univariate feature selection uses an F-test as default for calculating p-values (univariate scores) for each covariate. Model fitting for calculating the scores is done against the target feature. You have to define the number of covariate feature to retain a-priori (parameter n_features) To apply the Univariate feature selection as part of the process flow, edit the command file thus:
"specificFeatureSelection": {
"apply": true,
"univariateSelection": {
"apply": false,
"SelectKBest": {
"apply": true,
"n_features": 4
}
}
Figure 1 shows the outcomes of selecting 4 covariates from meancenterd spectra (left) derivatives (middle) and PCA decomposed bands (right). In all cases total nitrogen [N] was set as the target feature. The top row shows the selection of covaraites for the Ordinary Least Square (OLS) regressor; the bottom row for the Random Forest regressor.
Permutation Selection
Permutation feature importance measures the strength of the contribution of each covariate to a fitted model’s statistical performance. The covariates are shuffled randomly and the change in model statistical perforance when omitting a covaraites defines its strength. This generic method of evaluating covariates can be applied ultiple times to any regressor and can thus be used for covariate selection for any combination of target feature and regressor.
Figure 2 shows the outcomes of selecting 4 covariates from meancenterd spectra (left) derivatives (middle) and PCA decomposed bands (right). In all cases total nitrogen [N] was set as the target feature. The top row shows the selection of covaraites for the Ordinary Least Square (OLS) regressor; the bottom row for the Random Forest regressor.
RFE
Feature ranking with recursive feature elimination.
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through any specific attribute or callable. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.