Introduction
Tests with applying different regressors for predicting soil properties from spectra with the OSSL data indicate that the Cubist rule based regressor performs the best. Cubist is, at time of writing this in September 2023, not implemented in the package scikit-learn. There is, however a parallel implementation of Cubist that piggy-backs on scikit-learn. In this post you are going to install and fix Cubist by:
- adding Cubist to the virtual environment
- importing the Cubist package to OSSL_mlmodel.py,
- add Cubist to the model json command file, and
- fixing a bug in the Cubist package code.
Adding cubist to the virtual environment
Open a terminal Terminal window. By default the active virtual Anaconda Python environment is “base”, indicated by the prompt as:
(base)
Do not install Cubist with “base”, instead activate the virtual Python environment you created for the package ossl-xspectre (e.g. “ossl_py38”):
% conda activate ossl_py38
The prompt should change and instead of “(base)” should now read “(ossl_py38)”
(ossl_py38)
Now run the pip command:
pip install cubist
You have added the Cubist package to the virtual environment.
Importing the Cubist package to OSSL_mlmodel.py
With the Cubist package added to your virtual environment, you can import Cubist to the OSSL_mlmodel.py module.
Open the OSSL_mlmodel.py file in a text editor (or in Eclipse). At row 113 you should see the following line (commented with #):
#from cubist import Cubist
Remove the comment (“#”).
Further down, at row 1131, find the following text:
'''
if hasattr(self.regressionModels, 'Cubist') and self.regressionModels.Cubist.apply:
self.regressorModels.append(('Cubist', Cubist( **self.jsonparamsD['regressionModels']['Cubist']['hyperParams'])))
self.modelSelectD['Cubist'] = []
'''
This is the section that defines the Cubist regressor as part of the suite of regressors. This section is also commented but using the triple single quote signs (```). Remove the triple single quote signs that define the comment.
If you do not find the rows importing and defining the Cubist model, just search for “cubist” in the module file.
Add Cubist to the model json command file
To invoke Cubist in a project run you also have to add the Cubist model, and its marker layout, to your json model command file.
"MLP": {
"apply": false,
"hyperParams": {
"hidden_layer_sizes": [
100,
100
],
"max_iter": 200,
"tol": 0.001,
"epsilon": 1e-8
}
},
"Cubist": {
"apply": true,
"hyperParams": {
}
}
},
"regressionModelSymbols": {
"OLS": {
"marker": ".",
"size": 100
},
...
...
"MLP": {
"marker": "D",
"size": 50
},
"Cubist": {
"marker": "D",
"size": 50
}
},
"modelTests": {
Fixing a bug in the Cubist package code
There is “bug” in the Cubist - it uses an outdates command and will get stranded if you include it in the process-flow. Try running the OSSL_mlmodel.py. It probably reports an error message. To get the line where the error occurred, click on the reported error site in the text output from running the script. The error probably happened in the file called _quinlan_attributes.py, find that file in the output text, click it, and you should come to the exact line with the bug; row 68 in the class=’package’>Cubist</span> package support file _quinlan_attributes.py:
return {col_name: _get_data_format(col_data) for col_name, col_data in df.iteritems()}
The bug is that the command df.iteritems() is outdated and must be replaced with df.items():
return {col_name: _get_data_format(col_data) for col_name, col_data in df.items()}
Fix it, save the updated support module and try to run OSSL_mlmodel.py again.