使用评价指标工具

使用,评价,指标,工具 · 浏览次数 : 29

小编点评

评价指标的使用方法评价指标的accuracy和f1值是评估指标的常用方法，用于计算评估指标的值。 accuracy值是评估指标的准确值，表示评估指标的正确值。 f1值是评估指标的 f1值，表示评估指标的正确值和错误值的比例。 f1值的值越高，表示评估指标的正确值和错误值的比例越高。 f1 值的值介于 0.5 和 1.0，分别表示评估指标的正确值和错误值的比例。 f1 值的计算方法如下： f1 = 2 * accuracy - f1 其中： accuracy 是评估指标的准确值 f1 是评估指标的 f1 值 f1 值的计算方法如下： f1 = (2 * accuracy - f1)

正文

评估一个训练好的模型需要评估指标，比如正确率、查准率、查全率、F1值等。当然不同的任务类型有着不同的评估指标，而HuggingFace提供了统一的评价指标工具。

1.列出可用的评价指标
通过list_metrics()函数列出可用的评价指标：

def list_metric_test():    # 第4章/列出可用的评价指标    from datasets import list_metrics    metrics_list = list_metrics()    print(len(metrics_list), metrics_list[:5])
复制

输出结果如下所示：

157 ['accuracy', 'bertscore', 'bleu', 'bleurt', 'brier_score']
复制

可见目前包含157个评价指标，并且输出了前5个评价指标。

2.加载一个评价指标
通过load_metric()加载评价指标，需要说明的是有的评价指标和对应的数据集配套使用，这里以glue数据集的mrpc子集为例：

def load_metric_test():    # 第4章/加载评价指标    from datasets import load_metric    metric = load_metric(path="accuracy") #加载accuracy指标    print(metric)    # 第4章/加载一个评价指标    from datasets import load_metric    metric = load_metric(path='glue', config_name='mrpc') #加载glue数据集中的mrpc子集    print(metric)
复制

3.获取评价指标的使用说明
评价指标的inputs_description属性描述了评价指标的使用方法，以及评价指标的使用方法如下所示：

def load_metric_description_test():    # 第4章/加载一个评价指标    from datasets import load_metric    glue_metric = load_metric('glue', 'mrpc')  # 加载glue数据集中的mrpc子集    print(glue_metric.inputs_description)    references = [0, 1]    predictions = [0, 1]    results = glue_metric.compute(predictions=predictions, references=references)    print(results)  # {'accuracy': 1.0, 'f1': 1.0}
复制

输出结果如下所示：

Compute GLUE evaluation metric associated to each GLUE dataset.Args:    predictions: list of predictions to score.        Each translation should be tokenized into a list of tokens.    references: list of lists of references for each translation.        Each reference should be tokenized into a list of tokens.Returns: depending on the GLUE subset, one or several of:    "accuracy": Accuracy    "f1": F1 score    "pearson": Pearson Correlation    "spearmanr": Spearman Correlation    "matthews_correlation": Matthew CorrelationExamples:    >>> glue_metric = datasets.load_metric('glue', 'sst2')  # 'sst2' or any of ["mnli", "mnli_mismatched", "mnli_matched", "qnli", "rte", "wnli", "hans"]    >>> references = [0, 1]    >>> predictions = [0, 1]    >>> results = glue_metric.compute(predictions=predictions, references=references)    >>> print(results)    {'accuracy': 1.0}    >>> glue_metric = datasets.load_metric('glue', 'mrpc')  # 'mrpc' or 'qqp'    >>> references = [0, 1]    >>> predictions = [0, 1]    >>> results = glue_metric.compute(predictions=predictions, references=references)    >>> print(results)    {'accuracy': 1.0, 'f1': 1.0}    >>> glue_metric = datasets.load_metric('glue', 'stsb')    >>> references = [0., 1., 2., 3., 4., 5.]    >>> predictions = [0., 1., 2., 3., 4., 5.]    >>> results = glue_metric.compute(predictions=predictions, references=references)    >>> print({"pearson": round(results["pearson"], 2), "spearmanr": round(results["spearmanr"], 2)})    {'pearson': 1.0, 'spearmanr': 1.0}    >>> glue_metric = datasets.load_metric('glue', 'cola')    >>> references = [0, 1]    >>> predictions = [0, 1]    >>> results = glue_metric.compute(predictions=predictions, references=references)    >>> print(results)    {'matthews_correlation': 1.0}{'accuracy': 1.0, 'f1': 1.0}
复制

首先描述了评价指标的使用方法，然后计算评价指标accuracy和f1。