Scikit-learn special case: leave-one-out
Today I learned the hard way that sklearn.model_selection.cross_val_score() returns NaNs when you use a probability-based score (like AUC or log-loss) with leave-one-out cross-validation (LOO-CV). Intuitively, it makes sense why LOO-CV would be special since it returns a single value each round instead of an array of values. To overcome this issues, I built a wrapper for the combination of scikit-learn methods you need to make LOO-CV scoring behave like the scoring for other cross validation methods. You’ll find it below.
I tried to adhere to the scikit-learn form as much as possible, but anyone is free to remix this work to make it better.
from sklearn.metrics import roc_auc_score from sklearn.model_selection import LeaveOneOut,cross_val_predict def score_loo(X, y, estimator, score_func=roc_auc_score,*,needs_proba=True, n_jobs=-1, **kwargs): # additional keyword arguments are passed to score_func if needs_proba==True: y_hat = cross_val_predict(estimator, X, y=y, cv=LeaveOneOut(), n_jobs=n_jobs, method='predict_proba')[:,1] else: y_hat = cross_val_predict(estimator, X, y=y, cv=LeaveOneOut(), n_jobs=n_jobs, method='predict') score = score_func(np.ravel(y), y_hat, **kwargs) return score # example usage from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier(random_state=1, n_jobs=1, n_estimators=2) score_loo(estimator=rf, X=X, y=y, n_jobs=16, score_func=roc_auc_score, needs_proba=True)