Rand Index function (clustering performance evaluation)(随机指标函数(聚类性能评估))
本文介绍了随机指标函数(聚类性能评估)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
据我所知,Python中没有针对Rand Index的软件包,而对于调整后的Rand Index,您可以选择使用sklearn.metrics.adjusted_rand_score(labels_true, labels_pred)
。
我为Rand Score编写了代码,我将把它作为帖子的答案与其他人分享。
推荐答案
from scipy.misc import comb
from itertools import combinations
import numpy as np
def check_clusterings(labels_true, labels_pred):
"""Check that the two clusterings matching 1D integer arrays."""
labels_true = np.asarray(labels_true)
labels_pred = np.asarray(labels_pred)
# input checks
if labels_true.ndim != 1:
raise ValueError(
"labels_true must be 1D: shape is %r" % (labels_true.shape,))
if labels_pred.ndim != 1:
raise ValueError(
"labels_pred must be 1D: shape is %r" % (labels_pred.shape,))
if labels_true.shape != labels_pred.shape:
raise ValueError(
"labels_true and labels_pred must have same size, got %d and %d"
% (labels_true.shape[0], labels_pred.shape[0]))
return labels_true, labels_pred
def rand_score (labels_true, labels_pred):
"""given the true and predicted labels, it will return the Rand Index."""
check_clusterings(labels_true, labels_pred)
my_pair = list(combinations(range(len(labels_true)), 2)) #create list of all combinations with the length of labels.
def is_equal(x):
return (x[0]==x[1])
my_a = 0
my_b = 0
for i in range(len(my_pair)):
if(is_equal((labels_true[my_pair[i][0]],labels_true[my_pair[i][1]])) == is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]]))
and is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]])) == True):
my_a += 1
if(is_equal((labels_true[my_pair[i][0]],labels_true[my_pair[i][1]])) == is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]]))
and is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]])) == False):
my_b += 1
my_denom = comb(len(labels_true),2)
ri = (my_a + my_b) / my_denom
return ri
举个简单的例子:
labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]
rand_score (labels_true, labels_pred)
#0.46666666666666667
可能有一些方法可以改进它,使它更像蟒蛇。如果你有什么建议,你可以改进一下。
我找到了似乎更快的this implementation。
import numpy as np
from scipy.misc import comb
def rand_index_score(clusters, classes):
tp_plus_fp = comb(np.bincount(clusters), 2).sum()
tp_plus_fn = comb(np.bincount(classes), 2).sum()
A = np.c_[(clusters, classes)]
tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum()
for i in set(clusters))
fp = tp_plus_fp - tp
fn = tp_plus_fn - tp
tn = comb(len(A), 2) - tp - fp - fn
return (tp + tn) / (tp + fp + fn + tn)
举个简单的例子:
labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]
rand_index_score (labels_true, labels_pred)
#0.46666666666666667
这篇关于随机指标函数(聚类性能评估)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
沃梦达教程
本文标题为:随机指标函数(聚类性能评估)


基础教程推荐
猜你喜欢
- 在Python中从Azure BLOB存储中读取文件 2022-01-01
- 在同一图形上绘制Bokeh的烛台和音量条 2022-01-01
- PermissionError: pip 从 8.1.1 升级到 8.1.2 2022-01-01
- 求两个直方图的卷积 2022-01-01
- 包装空间模型 2022-01-01
- 使用大型矩阵时禁止 Pycharm 输出中的自动换行符 2022-01-01
- PANDA VALUE_COUNTS包含GROUP BY之前的所有值 2022-01-01
- 无法导入 Pytorch [WinError 126] 找不到指定的模块 2022-01-01
- 修改列表中的数据帧不起作用 2022-01-01
- Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙? 2022-01-01