Forward selection approach for feature selectionΒΆ

An introductory example that demonstrates how to perform feature selection using mico.MutualInformationForwardSelection.

Out:

================================================================================
Start classification example.
================================================================================
Started scaling data.
Started MIFS.
 - Method        : JMI
 - Num. threads  : 8
 - Num. features : 7
 Iter  Sel    Current MI
    1   22 +2.200000E+01
    2   24 +6.072368E-01
    3    3 +1.142508E+00
    4   23 +1.685222E+00
    5   27 +2.270703E+00
    6   20 +2.787265E+00
    7    2 +3.237995E+00
Started calculating final MI matrix.
Done MIFS.
 - Total feat.   : 30
 - Target feat.  : 7
 - Actual feat.  : 7
--------------------------------------------------------------------------------
Populate results.
 - Selected features:
[False False  True  True False False False False False False False False
 False False False False False False False False  True False  True  True
  True False False  True False False]
 - Feature importance scores:
[0.         0.         0.13860476 0.14050078 0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.         0.14277782 0.         0.1468793  0.14371891
 0.1401912  0.         0.         0.14732722 0.         0.        ]
 - X_transformed:
[[1.228e+02 1.001e+03 2.538e+01 ... 2.019e+03 1.622e-01 2.654e-01]
 [1.329e+02 1.326e+03 2.499e+01 ... 1.956e+03 1.238e-01 1.860e-01]
 [1.300e+02 1.203e+03 2.357e+01 ... 1.709e+03 1.444e-01 2.430e-01]
 ...
 [1.083e+02 8.581e+02 1.898e+01 ... 1.124e+03 1.139e-01 1.418e-01]
 [1.401e+02 1.265e+03 2.574e+01 ... 1.821e+03 1.650e-01 2.650e-01]
 [4.792e+01 1.810e+02 9.456e+00 ... 2.686e+02 8.996e-02 0.000e+00]]
================================================================================
Start regression example.
================================================================================
          age       sex       bmi        bp        s1        s2        s3        s4        s5        s6
0    0.038076  0.050680  0.061696  0.021872 -0.044223 -0.034821 -0.043401 -0.002592  0.019908 -0.017646
1   -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163  0.074412 -0.039493 -0.068330 -0.092204
2    0.085299  0.050680  0.044451 -0.005671 -0.045599 -0.034194 -0.032356 -0.002592  0.002864 -0.025930
3   -0.089063 -0.044642 -0.011595 -0.036656  0.012191  0.024991 -0.036038  0.034309  0.022692 -0.009362
4    0.005383 -0.044642 -0.036385  0.021872  0.003935  0.015596  0.008142 -0.002592 -0.031991 -0.046641
..        ...       ...       ...       ...       ...       ...       ...       ...       ...       ...
437  0.041708  0.050680  0.019662  0.059744 -0.005697 -0.002566 -0.028674 -0.002592  0.031193  0.007207
438 -0.005515  0.050680 -0.015906 -0.067642  0.049341  0.079165 -0.028674  0.034309 -0.018118  0.044485
439  0.041708  0.050680 -0.015906  0.017282 -0.037344 -0.013840 -0.024993 -0.011080 -0.046879  0.015491
440 -0.045472 -0.044642  0.039062  0.001215  0.016318  0.015283 -0.028674  0.026560  0.044528 -0.025930
441 -0.045472 -0.044642 -0.073030 -0.081414  0.083740  0.027809  0.173816 -0.039493 -0.004220  0.003064

[442 rows x 10 columns]
[151.  75. 141. 206. 135.  97. 138.  63. 110. 310. 101.  69. 179. 185.
 118. 171. 166. 144.  97. 168.  68.  49.  68. 245. 184. 202. 137.  85.
 131. 283. 129.  59. 341.  87.  65. 102. 265. 276. 252.  90. 100.  55.
  61.  92. 259.  53. 190. 142.  75. 142. 155. 225.  59. 104. 182. 128.
  52.  37. 170. 170.  61. 144.  52. 128.  71. 163. 150.  97. 160. 178.
  48. 270. 202. 111.  85.  42. 170. 200. 252. 113. 143.  51.  52. 210.
  65. 141.  55. 134.  42. 111.  98. 164.  48.  96.  90. 162. 150. 279.
  92.  83. 128. 102. 302. 198.  95.  53. 134. 144. 232.  81. 104.  59.
 246. 297. 258. 229. 275. 281. 179. 200. 200. 173. 180.  84. 121. 161.
  99. 109. 115. 268. 274. 158. 107.  83. 103. 272.  85. 280. 336. 281.
 118. 317. 235.  60. 174. 259. 178. 128.  96. 126. 288.  88. 292.  71.
 197. 186.  25.  84.  96. 195.  53. 217. 172. 131. 214.  59.  70. 220.
 268. 152.  47.  74. 295. 101. 151. 127. 237. 225.  81. 151. 107.  64.
 138. 185. 265. 101. 137. 143. 141.  79. 292. 178.  91. 116.  86. 122.
  72. 129. 142.  90. 158.  39. 196. 222. 277.  99. 196. 202. 155.  77.
 191.  70.  73.  49.  65. 263. 248. 296. 214. 185.  78.  93. 252. 150.
  77. 208.  77. 108. 160.  53. 220. 154. 259.  90. 246. 124.  67.  72.
 257. 262. 275. 177.  71.  47. 187. 125.  78.  51. 258. 215. 303. 243.
  91. 150. 310. 153. 346.  63.  89.  50.  39. 103. 308. 116. 145.  74.
  45. 115. 264.  87. 202. 127. 182. 241.  66.  94. 283.  64. 102. 200.
 265.  94. 230. 181. 156. 233.  60. 219.  80.  68. 332. 248.  84. 200.
  55.  85.  89.  31. 129.  83. 275.  65. 198. 236. 253. 124.  44. 172.
 114. 142. 109. 180. 144. 163. 147.  97. 220. 190. 109. 191. 122. 230.
 242. 248. 249. 192. 131. 237.  78. 135. 244. 199. 270. 164.  72.  96.
 306.  91. 214.  95. 216. 263. 178. 113. 200. 139. 139.  88. 148.  88.
 243.  71.  77. 109. 272.  60.  54. 221.  90. 311. 281. 182. 321.  58.
 262. 206. 233. 242. 123. 167.  63. 197.  71. 168. 140. 217. 121. 235.
 245.  40.  52. 104. 132.  88.  69. 219.  72. 201. 110.  51. 277.  63.
 118.  69. 273. 258.  43. 198. 242. 232. 175.  93. 168. 275. 293. 281.
  72. 140. 189. 181. 209. 136. 261. 113. 131. 174. 257.  55.  84.  42.
 146. 212. 233.  91. 111. 152. 120.  67. 310.  94. 183.  66. 173.  72.
  49.  64.  48. 178. 104. 132. 220.  57.]
Started binning data.
Started scaling data.
Started MIFS.
 - Method        : JMI
 - Num. threads  : 8
 - Num. features : 5
 Iter  Sel    Current MI
    1    2 +2.000000E+00
    2    0 +9.999990E+05
    3    3 +1.999998E+06
    4    4 +2.999997E+06
    5    5 +3.999996E+06
Started calculating final MI matrix.
Done MIFS.
 - Total feat.   : 10
 - Target feat.  : 5
 - Actual feat.  : 5
--------------------------------------------------------------------------------
Populate results.
 - Selected features:
[ True False  True  True  True  True False False False False]
 - Feature importance scores:
[0.2 0.  0.2 0.2 0.2 0.2 0.  0.  0.  0. ]
 - X_transformed:
[[ 0.03807591  0.06169621  0.02187235 -0.0442235  -0.03482076]
 [-0.00188202 -0.05147406 -0.02632783 -0.00844872 -0.01916334]
 [ 0.08529891  0.04445121 -0.00567061 -0.04559945 -0.03419447]
 ...
 [ 0.04170844 -0.01590626  0.01728186 -0.03734373 -0.01383982]
 [-0.04547248  0.03906215  0.00121513  0.01631843  0.01528299]
 [-0.04547248 -0.0730303  -0.08141377  0.08374012  0.02780893]]

 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
 from mico import MutualInformationForwardSelection
 import pandas as pd
 from sklearn.datasets import load_breast_cancer, load_diabetes


 def test_mifs_classification():

     print("=" * 80)
     print("Start classification example.")
     print("=" * 80)

     # Prepare data.
     data = load_breast_cancer()
     y = data.target
     X = pd.DataFrame(data.data, columns=data.feature_names)

     # Perform feature selection.
     mifs = MutualInformationForwardSelection(verbose=2, categorical=True, n_features=7)
     mifs.fit(X, y)

     print("-" * 80)
     print("Populate results.")
     # Populate selected features.
     print(" - Selected features: \n{}".format(mifs.get_support()))
     # Populate feature importance scores.
     print(" - Feature importance scores: \n{}".format(mifs.feature_importances_))
     # Call transform() on X.
     X_transformed = mifs.transform(X)
     print(" - X_transformed: \n{}".format(X_transformed))


 def test_mifs_regression():

     print("=" * 80)
     print("Start regression example.")
     print("=" * 80)

     # Prepare data.
     data = load_diabetes()
     y = data.target
     X = pd.DataFrame(data.data, columns=data.feature_names)
     print(X)
     print(y)

     # Perform feature selection.
     mifs = MutualInformationForwardSelection(verbose=2, num_bins=10, categorical=False, n_features=5)
     mifs.fit(X, y)

     print("-" * 80)
     print("Populate results.")
     # Populate selected features.
     print(" - Selected features: \n{}".format(mifs.get_support()))
     # Populate feature importance scores.
     print(" - Feature importance scores: \n{}".format(mifs.feature_importances_))
     # Call transform() on X.
     X_transformed = mifs.transform(X)
     print(" - X_transformed: \n{}".format(X_transformed))


 if __name__ == '__main__':
     test_mifs_classification()
     test_mifs_regression()

Total running time of the script: ( 0 minutes 9.852 seconds)

Gallery generated by Sphinx-Gallery