
0. Sklearn Metric
.1. Classification metrics
See the Classification metrics section of the user guide for further details.
metrics.accuracy_score (y_true, y_pred, *[, …]) |
Accuracy classification score. |
metrics.auc (x, y) |
Compute Area Under the Curve (AUC) using the trapezoidal rule. |
metrics.average_precision_score (y_true, …) |
Compute average precision (AP) from prediction scores. |
metrics.balanced_accuracy_score (y_true, …) |
Compute the balanced accuracy. |
metrics.brier_score_loss (y_true, y_prob, *) |
Compute the Brier score loss. |
metrics.classification_report (y_true, y_pred, *) |
Build a text report showing the main classification metrics. |
metrics.cohen_kappa_score (y1, y2, *[, …]) |
Cohen’s kappa: a statistic that measures inter-annotator agreement. |
metrics.confusion_matrix (y_true, y_pred, *) |
Compute confusion matrix to evaluate the accuracy of a classification. |
metrics.dcg_score (y_true, y_score, *[, k, …]) |
Compute Discounted Cumulative Gain. |
metrics.det_curve (y_true, y_score[, …]) |
Compute error rates for different probability thresholds. |
metrics.f1_score (y_true, y_pred, *[, …]) |
Compute the F1 score, also known as balanced F-score or F-measure. |
metrics.fbeta_score (y_true, y_pred, *, beta) |
Compute the F-beta score. |
metrics.hamming_loss (y_true, y_pred, *[, …]) |
Compute the average Hamming loss. |
metrics.hinge_loss (y_true, pred_decision, *) |
Average hinge loss (non-regularized). |
metrics.jaccard_score (y_true, y_pred, *[, …]) |
Jaccard similarity coefficient score. |
metrics.log_loss (y_true, y_pred, *[, eps, …]) |
Log loss, aka logistic loss or cross-entropy loss. |
metrics.matthews_corrcoef (y_true, y_pred, *) |
Compute the Matthews correlation coefficient (MCC). |
metrics.multilabel_confusion_matrix (y_true, …) |
Compute a confusion matrix for each class or sample. |
metrics.ndcg_score (y_true, y_score, *[, k, …]) |
Compute Normalized Discounted Cumulative Gain. |
metrics.precision_recall_curve (y_true, …) |
Compute precision-recall pairs for different probability thresholds. |
metrics.precision_recall_fscore_support (…) |
Compute precision, recall, F-measure and support for each class. |
metrics.precision_score (y_true, y_pred, *[, …]) |
Compute the precision. |
metrics.recall_score (y_true, y_pred, *[, …]) |
Compute the recall. |
metrics.roc_auc_score (y_true, y_score, *[, …]) |
Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores. |
metrics.roc_curve (y_true, y_score, *[, …]) |
Compute Receiver operating characteristic (ROC). |
metrics.top_k_accuracy_score (y_true, y_score, *) |
Top-k Accuracy classification score. |
metrics.zero_one_loss (y_true, y_pred, *[, …]) |
Zero-one classification loss. |
.2. Regression metrics
See the Regression metrics section of the user guide for further details.
metrics.explained_variance_score (y_true, …) |
Explained variance regression score function. |
metrics.max_error (y_true, y_pred) |
max_error metric calculates the maximum residual error. |
metrics.mean_absolute_error (y_true, y_pred, *) |
Mean absolute error regression loss. |
metrics.mean_squared_error (y_true, y_pred, *) |
Mean squared error regression loss. |
metrics.mean_squared_log_error (y_true, y_pred, *) |
Mean squared logarithmic error regression loss. |
metrics.median_absolute_error (y_true, y_pred, *) |
Median absolute error regression loss. |
metrics.mean_absolute_percentage_error (…) |
Mean absolute percentage error regression loss. |
metrics.r2_score (y_true, y_pred, *[, …]) |
R2 (coefficient of determination) regression score function. |
metrics.mean_poisson_deviance (y_true, y_pred, *) |
Mean Poisson deviance regression loss. |
metrics.mean_gamma_deviance (y_true, y_pred, *) |
Mean Gamma deviance regression loss. |
metrics.mean_tweedie_deviance (y_true, y_pred, *) |
Mean Tweedie deviance regression loss. |
.3. Multilabel ranking metrics
See the Multilabel ranking metrics section of the user guide for further details.
metrics.coverage_error (y_true, y_score, *[, …]) |
Coverage error measure. |
metrics.label_ranking_average_precision_score (…) |
Compute ranking-based average precision. |
metrics.label_ranking_loss (y_true, y_score, *) |
Compute Ranking loss measure. |
.4. Clustering metrics
The sklearn.metrics.cluster
submodule contains evaluation metrics for cluster analysis results. There are two forms of evaluation:
- supervised, which uses a ground truth class values for each sample.
- unsupervised, which does not and measures the ‘quality’ of the model itself.
metrics.adjusted_mutual_info_score (…[, …]) |
Adjusted Mutual Information between two clusterings. |
metrics.adjusted_rand_score (labels_true, …) |
Rand index adjusted for chance. |
metrics.calinski_harabasz_score (X, labels) |
Compute the Calinski and Harabasz score. |
metrics.davies_bouldin_score (X, labels) |
Computes the Davies-Bouldin score. |
metrics.completeness_score (labels_true, …) |
Completeness metric of a cluster labeling given a ground truth. |
metrics.cluster.contingency_matrix (…[, …]) |
Build a contingency matrix describing the relationship between labels. |
metrics.cluster.pair_confusion_matrix (…) |
Pair confusion matrix arising from two clusterings. |
metrics.fowlkes_mallows_score (labels_true, …) |
Measure the similarity of two clusterings of a set of points. |
metrics.homogeneity_completeness_v_measure (…) |
Compute the homogeneity and completeness and V-Measure scores at once. |
metrics.homogeneity_score (labels_true, …) |
Homogeneity metric of a cluster labeling given a ground truth. |
metrics.mutual_info_score (labels_true, …) |
Mutual Information between two clusterings. |
metrics.normalized_mutual_info_score (…[, …]) |
Normalized Mutual Information between two clusterings. |
metrics.rand_score (labels_true, labels_pred) |
Rand index. |
metrics.silhouette_score (X, labels, *[, …]) |
Compute the mean Silhouette Coefficient of all samples. |
metrics.silhouette_samples (X, labels, *[, …]) |
Compute the Silhouette Coefficient for each sample. |
metrics.v_measure_score (labels_true, …[, beta]) |
V-measure cluster labeling given a ground truth. |
.5. Pairwise metrics
metrics.pairwise.additive_chi2_kernel (X[, Y]) |
Computes the additive chi-squared kernel between observations in X and Y. |
metrics.pairwise.chi2_kernel (X[, Y, gamma]) |
Computes the exponential chi-squared kernel X and Y. |
metrics.pairwise.cosine_similarity (X[, Y, …]) |
Compute cosine similarity between samples in X and Y. |
metrics.pairwise.cosine_distances (X[, Y]) |
Compute cosine distance between samples in X and Y. |
metrics.pairwise.distance_metrics () |
Valid metrics for pairwise_distances. |
metrics.pairwise.euclidean_distances (X[, Y, …]) |
Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors. |
metrics.pairwise.haversine_distances (X[, Y]) |
Compute the Haversine distance between samples in X and Y. |
metrics.pairwise.kernel_metrics () |
Valid metrics for pairwise_kernels. |
metrics.pairwise.laplacian_kernel (X[, Y, gamma]) |
Compute the laplacian kernel between X and Y. |
metrics.pairwise.linear_kernel (X[, Y, …]) |
Compute the linear kernel between X and Y. |
metrics.pairwise.manhattan_distances (X[, Y, …]) |
Compute the L1 distances between the vectors in X and Y. |
metrics.pairwise.nan_euclidean_distances (X) |
Calculate the euclidean distances in the presence of missing values. |
metrics.pairwise.pairwise_kernels (X[, Y, …]) |
Compute the kernel between arrays X and optional array Y. |
metrics.pairwise.polynomial_kernel (X[, Y, …]) |
Compute the polynomial kernel between X and Y. |
metrics.pairwise.rbf_kernel (X[, Y, gamma]) |
Compute the rbf (gaussian) kernel between X and Y. |
metrics.pairwise.sigmoid_kernel (X[, Y, …]) |
Compute the sigmoid kernel between X and Y. |
metrics.pairwise.paired_euclidean_distances (X, Y) |
Computes the paired euclidean distances between X and Y. |
metrics.pairwise.paired_manhattan_distances (X, Y) |
Compute the L1 distances between the vectors in X and Y. |
metrics.pairwise.paired_cosine_distances (X, Y) |
Computes the paired cosine distances between X and Y. |
metrics.pairwise.paired_distances (X, Y, *[, …]) |
Computes the paired distances between X and Y. |
metrics.pairwise_distances (X[, Y, metric, …]) |
Compute the distance matrix from a vector array X and optional Y. |
metrics.pairwise_distances_argmin (X, Y, *[, …]) |
Compute minimum distances between one point and a set of points. |
metrics.pairwise_distances_argmin_min (X, Y, *) |
Compute minimum distances between one point and a set of points. |
metrics.pairwise_distances_chunked (X[, Y, …]) |
Generate a distance matrix chunk by chunk with optional reduction. |
.6. Plotting
metrics.plot_confusion_matrix (estimator, X, …) |
Plot Confusion Matrix. |
metrics.plot_det_curve (estimator, X, y, *[, …]) |
Plot detection error tradeoff (DET) curve. |
metrics.plot_precision_recall_curve (…[, …]) |
Plot Precision Recall Curve for binary classifiers. |
metrics.plot_roc_curve (estimator, X, y, *[, …]) |
Plot Receiver operating characteristic (ROC) curve. |
metrics.ConfusionMatrixDisplay (…[, …]) |
Confusion Matrix visualization. |
metrics.DetCurveDisplay (*, fpr, fnr[, …]) |
DET curve visualization. |
metrics.PrecisionRecallDisplay (precision, …) |
Precision Recall visualization. |
metrics.RocCurveDisplay (*, fpr, tpr[, …]) |
ROC Curve visualization. |
1. IOU
2. Precision
3. Accuracy
4. Recall
5. PRC图例
F1-score 就是一个综合考虑precision和recall的指标,比BEP更为常用。
6. ROC&AUC&K-S曲线
ROC全称是“受试者工作特征”(Receiver Operating Characteristic)曲线,ROC曲线以“真正例率”(TPR)为Y轴,以“假正例率”(FPR)为X轴,对角线对应于“随机猜测”模型,而(0,1)则对应“理想模型”。ROC形式如下图所示。针对二分类
AUC(Area Under Curve)的值为ROC曲线下面的面积,若分类器的性能极好,则AUC为1。一般AUC均在0.5到1之间,AUC越高,模型的区分能力越好.0.85 – 0.95: 效果很好0.95 – 1: 效果非常好,但一般不太可能
- KS值越大,说明模型能将两类样本区分开的能力越大。
(即TPR 和 FPR,和 ROC 曲线使用的指标一样,只是两者的横坐标不同)。由小到大改变阈值从而得到多个点
。正、负实例累积曲线相减得到KS曲线, KS曲线的最高点即KS值,该点所对应的阈值划分点即模型最佳划分能力的点。
7. Confusion Matrix
8. 泛化能力
- 损失函数:度量预测错误程度的函数
- 训练误差:训练数据集上的平均损失,虽然有意义,但本质不重要
- 测试误差:测试数据集上的平均损失,反应了模型对未知数据的预测能力
9. 过拟合&欠拟合
- 模型记住了数据中的噪音 意味着模型受到噪音的干扰,导致拟合的函数形状与实际总体的数据分布相差甚远。这里的噪音可以是标记错误的样本,也可以是少量明显偏离总体分布的样本(异常点)。通过清洗样本或异常值处理可以帮助缓解这个问题。
- 训练数据过少 导致训练的数据集根本无法代表整体的数据情况,做什么也是徒劳的。需要想方设法增加数据,包括人工合成假样本。
- 模型复杂度过高 导致模型对训练数据学习过度
- 模型过于简单 即模型形式太简单,以致于无法捕捉到数据特征,无法很好的拟合数据
10. 偏差和方差
**偏差:**the difference between your model’s expected predictions and the true values.
,刻画了模型本身的拟合能力。**方差:**refers to your algorithm’s sensitivity to specific sets of training data. High variance algorithms will produce drastically different models depending on the training set.
11. 回归度量
.1. 平均绝对误差MAE
.2. 均方根误差RMSE
.3. 均方根对数误差RMSLE
当真实值的分布范围比较广时(如:年收入可以从 0 到非常大的数),如果使用
12. PSI(模型稳定性)
稳定度指标(population stability index ,PSI)可衡量测试样本及模型开发样本评分的的分布差异,为最常见的模型稳定度评估指针。其实PSI表示的就是按分数分档后,针对不同样本,或者不同时间的样本.
13. 验证测试集
如果未设置验证集,则将数据三七分:70% 的数据用作训练集、30% 的数据用作测试集。
如果设置验证集,则将数据划分为:60% 的数据用作训练集、20%的数据用过验证集、20% 的数据用作测试集。
,它们都要很好的代表你的真实应用场景中的数据分布。 -
- 第一个原因:模型只见过训练集数据,没有见过验证集的数据导致的,是数据不匹配的问题。
- 第二个原因:模型本来就存在较大的方差。