集成学习实战Bagging/Boosting/Stacking1. 集成学习原理集成学习Ensemble Learning ├── 核心思想多个弱学习器组合成强学习器 ├── 三大方法 │ ├── Bagging并行训练投票/平均随机森林 │ ├── Boosting串行训练逐步纠错XGBoost │ └── Stacking多层模型元学习器组合 └── 优势降低方差、降低偏差、提高泛化能力2. Baggingfromsklearn.ensembleimportBaggingClassifier,BaggingRegressorfromsklearn.treeimportDecisionTreeClassifier# Bagging 分类baggingBaggingClassifier(estimatorDecisionTreeClassifier(),n_estimators100,max_samples0.8,max_features0.8,bootstrapTrue,random_state42,n_jobs-1)bagging.fit(X_train,y_train)3. Boosting# AdaBoostfromsklearn.ensembleimportAdaBoostClassifier adaAdaBoostClassifier(n_estimators100,learning_rate0.1,random_state42)ada.fit(X_train,y_train)# Gradient Boostingfromsklearn.ensembleimportGradientBoostingClassifier gbGradientBoostingClassifier(n_estimators100,max_depth3,learning_rate0.1,subsample0.8,random_state42)gb.fit(X_train,y_train)# XGBoostimportxgboostasxgb xgb_clfxgb.XGBClassifier(n_estimators100,max_depth6,learning_rate0.1,random_state42)xgb_clf.fit(X_train,y_train)# LightGBMimportlightgbmaslgb lgb_clflgb.LGBMClassifier(n_estimators100,max_depth6,learning_rate0.1,random_state42)lgb_clf.fit(X_train,y_train)4. Stackingfromsklearn.ensembleimportStackingClassifierfromsklearn.linear_modelimportLogisticRegression# 定义基学习器estimators[(rf,RandomForestClassifier(n_estimators100)),(svm,SVC(probabilityTrue)),(xgb,xgb.XGBClassifier(n_estimators100))]# StackingstackingStackingClassifier(estimatorsestimators,final_estimatorLogisticRegression(),cv5,n_jobs-1)stacking.fit(X_train,y_train)5. 投票集成fromsklearn.ensembleimportVotingClassifier# 硬投票voting_hardVotingClassifier(estimatorsestimators,votinghard)# 软投票概率平均voting_softVotingClassifier(estimatorsestimators,votingsoft)总结方法代表算法优势适用场景Bagging随机森林降低方差高方差模型BoostingXGBoost降低偏差高偏差模型Stacking多模型组合综合优势竞赛/复杂场景
集成学习实战:Bagging/Boosting/Stacking
集成学习实战Bagging/Boosting/Stacking1. 集成学习原理集成学习Ensemble Learning ├── 核心思想多个弱学习器组合成强学习器 ├── 三大方法 │ ├── Bagging并行训练投票/平均随机森林 │ ├── Boosting串行训练逐步纠错XGBoost │ └── Stacking多层模型元学习器组合 └── 优势降低方差、降低偏差、提高泛化能力2. Baggingfromsklearn.ensembleimportBaggingClassifier,BaggingRegressorfromsklearn.treeimportDecisionTreeClassifier# Bagging 分类baggingBaggingClassifier(estimatorDecisionTreeClassifier(),n_estimators100,max_samples0.8,max_features0.8,bootstrapTrue,random_state42,n_jobs-1)bagging.fit(X_train,y_train)3. Boosting# AdaBoostfromsklearn.ensembleimportAdaBoostClassifier adaAdaBoostClassifier(n_estimators100,learning_rate0.1,random_state42)ada.fit(X_train,y_train)# Gradient Boostingfromsklearn.ensembleimportGradientBoostingClassifier gbGradientBoostingClassifier(n_estimators100,max_depth3,learning_rate0.1,subsample0.8,random_state42)gb.fit(X_train,y_train)# XGBoostimportxgboostasxgb xgb_clfxgb.XGBClassifier(n_estimators100,max_depth6,learning_rate0.1,random_state42)xgb_clf.fit(X_train,y_train)# LightGBMimportlightgbmaslgb lgb_clflgb.LGBMClassifier(n_estimators100,max_depth6,learning_rate0.1,random_state42)lgb_clf.fit(X_train,y_train)4. Stackingfromsklearn.ensembleimportStackingClassifierfromsklearn.linear_modelimportLogisticRegression# 定义基学习器estimators[(rf,RandomForestClassifier(n_estimators100)),(svm,SVC(probabilityTrue)),(xgb,xgb.XGBClassifier(n_estimators100))]# StackingstackingStackingClassifier(estimatorsestimators,final_estimatorLogisticRegression(),cv5,n_jobs-1)stacking.fit(X_train,y_train)5. 投票集成fromsklearn.ensembleimportVotingClassifier# 硬投票voting_hardVotingClassifier(estimatorsestimators,votinghard)# 软投票概率平均voting_softVotingClassifier(estimatorsestimators,votingsoft)总结方法代表算法优势适用场景Bagging随机森林降低方差高方差模型BoostingXGBoost降低偏差高偏差模型Stacking多模型组合综合优势竞赛/复杂场景