Machine Learning with Python - AdaBoost

Machine Learning with Python - AdaBoost - It is one the most successful boosting ensemble algorithm. The main key of this algorithm is in the way they give weights to the instances in dataset.

Machine Learning with Python - AdaBoost

It is one of the most successful boosting ensemble algorithm. The main key of this algorithm is in the way they give weights to the instances in the dataset. Due to this, the algorithm needs to pay less attention to the instances while constructing subsequent models.

In the following Python recipe, we are going to build the Ada Boost ensemble model for classification by using the AdaBoostClassifier class of sklearn on Pima Indians diabetes dataset.

First, import the required packages as follows −

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import AdaBoostClassifier

Now, we need to load the Pima diabetes dataset as did in previous examples −

path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
data = read_csv(path, names = headernames)
array = data.values
X = array[:,0:8]
Y = array[:,8]

Next, give the input for 10-fold cross-validation as follows −

seed = 5
kfold = KFold(n_splits = 10, random_state = seed)

We need to provide the number of trees we are going to build. Here we are building 150 trees with split points chosen from 5 features −

num_trees = 50

Next, build the model with the help of following script −

model = AdaBoostClassifier(n_estimators = num_trees, random_state = seed)

Calculate and print the result as follows −

results = cross_val_score(model, X, Y, cv = kfold)
print(results.mean())

Output

0.7539473684210527

The output above shows that we got around 75% accuracy of our AdaBoost classifier ensemble model.