Note
Click here to download the full example code
Backward elimination approach for feature selectionΒΆ
An introductory example that demonstrates how to perform feature selection using mico.MutualInformationBackwardElimination
.
Out:
================================================================================
Start classification example.
================================================================================
--------------------------------------------------------------------------------
Populate results.
- Selected features:
[False False True True False False False False False False False False
False False False False False False False False True False True True
True False False True False False]
- Feature importance scores:
[0. 0. 0.13860476 0.14050078 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.
0. 0. 0.14277782 0. 0.1468793 0.14371891
0.1401912 0. 0. 0.14732722 0. 0. ]
- X_transformed:
[[1.228e+02 1.001e+03 2.538e+01 ... 2.019e+03 1.622e-01 2.654e-01]
[1.329e+02 1.326e+03 2.499e+01 ... 1.956e+03 1.238e-01 1.860e-01]
[1.300e+02 1.203e+03 2.357e+01 ... 1.709e+03 1.444e-01 2.430e-01]
...
[1.083e+02 8.581e+02 1.898e+01 ... 1.124e+03 1.139e-01 1.418e-01]
[1.401e+02 1.265e+03 2.574e+01 ... 1.821e+03 1.650e-01 2.650e-01]
[4.792e+01 1.810e+02 9.456e+00 ... 2.686e+02 8.996e-02 0.000e+00]]
================================================================================
Start regression example.
================================================================================
age sex bmi bp s1 s2 s3 s4 s5 s6
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401 -0.002592 0.019908 -0.017646
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412 -0.039493 -0.068330 -0.092204
2 0.085299 0.050680 0.044451 -0.005671 -0.045599 -0.034194 -0.032356 -0.002592 0.002864 -0.025930
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038 0.034309 0.022692 -0.009362
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142 -0.002592 -0.031991 -0.046641
.. ... ... ... ... ... ... ... ... ... ...
437 0.041708 0.050680 0.019662 0.059744 -0.005697 -0.002566 -0.028674 -0.002592 0.031193 0.007207
438 -0.005515 0.050680 -0.015906 -0.067642 0.049341 0.079165 -0.028674 0.034309 -0.018118 0.044485
439 0.041708 0.050680 -0.015906 0.017282 -0.037344 -0.013840 -0.024993 -0.011080 -0.046879 0.015491
440 -0.045472 -0.044642 0.039062 0.001215 0.016318 0.015283 -0.028674 0.026560 0.044528 -0.025930
441 -0.045472 -0.044642 -0.073030 -0.081414 0.083740 0.027809 0.173816 -0.039493 -0.004220 0.003064
[442 rows x 10 columns]
[151. 75. 141. 206. 135. 97. 138. 63. 110. 310. 101. 69. 179. 185.
118. 171. 166. 144. 97. 168. 68. 49. 68. 245. 184. 202. 137. 85.
131. 283. 129. 59. 341. 87. 65. 102. 265. 276. 252. 90. 100. 55.
61. 92. 259. 53. 190. 142. 75. 142. 155. 225. 59. 104. 182. 128.
52. 37. 170. 170. 61. 144. 52. 128. 71. 163. 150. 97. 160. 178.
48. 270. 202. 111. 85. 42. 170. 200. 252. 113. 143. 51. 52. 210.
65. 141. 55. 134. 42. 111. 98. 164. 48. 96. 90. 162. 150. 279.
92. 83. 128. 102. 302. 198. 95. 53. 134. 144. 232. 81. 104. 59.
246. 297. 258. 229. 275. 281. 179. 200. 200. 173. 180. 84. 121. 161.
99. 109. 115. 268. 274. 158. 107. 83. 103. 272. 85. 280. 336. 281.
118. 317. 235. 60. 174. 259. 178. 128. 96. 126. 288. 88. 292. 71.
197. 186. 25. 84. 96. 195. 53. 217. 172. 131. 214. 59. 70. 220.
268. 152. 47. 74. 295. 101. 151. 127. 237. 225. 81. 151. 107. 64.
138. 185. 265. 101. 137. 143. 141. 79. 292. 178. 91. 116. 86. 122.
72. 129. 142. 90. 158. 39. 196. 222. 277. 99. 196. 202. 155. 77.
191. 70. 73. 49. 65. 263. 248. 296. 214. 185. 78. 93. 252. 150.
77. 208. 77. 108. 160. 53. 220. 154. 259. 90. 246. 124. 67. 72.
257. 262. 275. 177. 71. 47. 187. 125. 78. 51. 258. 215. 303. 243.
91. 150. 310. 153. 346. 63. 89. 50. 39. 103. 308. 116. 145. 74.
45. 115. 264. 87. 202. 127. 182. 241. 66. 94. 283. 64. 102. 200.
265. 94. 230. 181. 156. 233. 60. 219. 80. 68. 332. 248. 84. 200.
55. 85. 89. 31. 129. 83. 275. 65. 198. 236. 253. 124. 44. 172.
114. 142. 109. 180. 144. 163. 147. 97. 220. 190. 109. 191. 122. 230.
242. 248. 249. 192. 131. 237. 78. 135. 244. 199. 270. 164. 72. 96.
306. 91. 214. 95. 216. 263. 178. 113. 200. 139. 139. 88. 148. 88.
243. 71. 77. 109. 272. 60. 54. 221. 90. 311. 281. 182. 321. 58.
262. 206. 233. 242. 123. 167. 63. 197. 71. 168. 140. 217. 121. 235.
245. 40. 52. 104. 132. 88. 69. 219. 72. 201. 110. 51. 277. 63.
118. 69. 273. 258. 43. 198. 242. 232. 175. 93. 168. 275. 293. 281.
72. 140. 189. 181. 209. 136. 261. 113. 131. 174. 257. 55. 84. 42.
146. 212. 233. 91. 111. 152. 120. 67. 310. 94. 183. 66. 173. 72.
49. 64. 48. 178. 104. 132. 220. 57.]
--------------------------------------------------------------------------------
Populate results.
- Selected features:
[False False False False False True True True True True]
- Feature importance scores:
[0. 0. 0. 0. 0. 0.2 0.2 0.2 0.2 0.2]
- X_transformed:
[[-0.03482076 -0.04340085 -0.00259226 0.01990842 -0.01764613]
[-0.01916334 0.07441156 -0.03949338 -0.06832974 -0.09220405]
[-0.03419447 -0.03235593 -0.00259226 0.00286377 -0.02593034]
...
[-0.01383982 -0.02499266 -0.01107952 -0.04687948 0.01549073]
[ 0.01528299 -0.02867429 0.02655962 0.04452837 -0.02593034]
[ 0.02780893 0.17381578 -0.03949338 -0.00421986 0.00306441]]
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | from mico import MutualInformationBackwardElimination
import pandas as pd
from sklearn.datasets import load_breast_cancer, load_diabetes
def test_mibe_classification():
print("=" * 80)
print("Start classification example.")
print("=" * 80)
# Prepare data.
data = load_breast_cancer()
y = data.target
X = pd.DataFrame(data.data, columns=data.feature_names)
# Perform feature selection.
mibe = MutualInformationBackwardElimination(verbose=2, categorical=True, n_features=7)
mibe.fit(X, y)
print("-" * 80)
print("Populate results.")
# Populate selected features.
print(" - Selected features: \n{}".format(mibe.get_support()))
# Populate feature importance scores.
print(" - Feature importance scores: \n{}".format(mibe.feature_importances_))
# Call transform() on X.
X_transformed = mibe.transform(X)
print(" - X_transformed: \n{}".format(X_transformed))
def test_mibe_regression():
print("=" * 80)
print("Start regression example.")
print("=" * 80)
# Prepare data.
data = load_diabetes()
y = data.target
X = pd.DataFrame(data.data, columns=data.feature_names)
print(X)
print(y)
# Perform feature selection.
mibe = MutualInformationBackwardElimination(verbose=2, num_bins=10, categorical=False, n_features=5)
mibe.fit(X, y)
print("-" * 80)
print("Populate results.")
# Populate selected features.
print(" - Selected features: \n{}".format(mibe.get_support()))
# Populate feature importance scores.
print(" - Feature importance scores: \n{}".format(mibe.feature_importances_))
# Call transform() on X.
X_transformed = mibe.transform(X)
print(" - X_transformed: \n{}".format(X_transformed))
if __name__ == '__main__':
test_mibe_classification()
test_mibe_regression()
|
Total running time of the script: ( 0 minutes 13.132 seconds)