Metrics#

This module contains functions for evaluating the performance of a RAG assistant or a retriever.

class flexrag.metrics.MetricsBase[源代码]#
abstract compute(questions=None, responses=None, golden_responses=None, retrieved_contexts=None, golden_contexts=None)[源代码]#

Compute the metric value.

参数:
  • questions (list[str], optional) -- A list of questions. Defaults to None.

  • responses (list[str], optional) -- A list of responses. Defaults to None.

  • golden_responses (list[list[str]], optional) -- A list of golden responses. Defaults to None.

  • retrieved_contexts (list[list[str | RetrievedContext]], optional) -- A list of retrieved contexts. Defaults to None.

  • golden_contexts (list[list[str]], optional) -- A list of golden contexts. Defaults to None.

返回:

The metric scores and the metadata of the metric.

返回类型:

tuple[dict[str, float], dict]

Helper Class#

The RAGEvaluator takes a list of metrics and evaluates the performance of a RAG assistant or a retriever.

class flexrag.metrics.EvaluatorConfig(metrics_type=<factory>, generation_bleu_config=<factory>, generation_chrf_config=<factory>, generation_em_config=<factory>, generation_accuracy_config=<factory>, generation_f1_config=<factory>, generation_recall_config=<factory>, generation_precision_config=<factory>, retrieval_success_rate_config=<factory>, retrieval_recall_config=<factory>, retrieval_precision_config=<factory>, retrieval_map_config=<factory>, retrieval_ndcg_config=<factory>, round=2)[源代码]#
dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.Evaluator(cfg)[源代码]#

基类:object

evaluate(*, questions=None, responses=None, golden_responses=None, retrieved_contexts=None, golden_contexts=None, log=True)[源代码]#

Evaluate the generated responses against the ground truth responses.

参数:
  • questions (list[str], optional) -- A list of questions. Defaults to None.

  • responses (list[str], optional) -- A list of responses. Defaults to None.

  • golden_responses (list[list[str]], optional) -- A list of golden responses. Defaults to None.

  • retrieved_contexts (list[list[str | RetrievedContext]], optional) -- A list of retrieved contexts. Defaults to None.

  • golden_contexts (list[list[str]], optional) -- A list of golden contexts. Defaults to None.

  • log (bool, optional) -- Whether to log the evaluation results. Defaults to True.

返回:

The evaluation results and the evaluation details.

返回类型:

tuple[dict[str, float], dict[str, Any]]

RAG Generation Metrics#

class flexrag.metrics.BLEUConfig(tokenizer='13a')#

Configuration for BLEU metric. The computation of BLEU score is based on sacrebleu.

参数:

tokenizer (str) -- The tokenizer to use. Defaults to sacrebleu.BLEU.TOKENIZER_DEFAULT. Available choices: Please refer to sacrebleu.BLEU.TOKENIZERS.

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.BLEU(cfg)#

基类:MetricsBase

The BLEU metric.

class flexrag.metrics.Rouge#

基类:MetricsBase

The Rouge metric. The computation of Rouge score is based on rouge. This metric will return the average of the Rouge-1, Rouge-2, and Rouge-L F1 scores.

compute(**kwargs)#

Compute the metric value.

参数:
  • questions (list[str], optional) -- A list of questions. Defaults to None.

  • responses (list[str], optional) -- A list of responses. Defaults to None.

  • golden_responses (list[list[str]], optional) -- A list of golden responses. Defaults to None.

  • retrieved_contexts (list[list[str | RetrievedContext]], optional) -- A list of retrieved contexts. Defaults to None.

  • golden_contexts (list[list[str]], optional) -- A list of golden contexts. Defaults to None.

返回:

The metric scores and the metadata of the metric.

返回类型:

tuple[dict[str, float], dict]

class flexrag.metrics.chrFConfig(chrf_beta=1.0, chrf_char_order=6, chrf_word_order=0)#

Configuration for chrF metric. The computation of chrF score is based on sacrebleu.

参数:
  • chrf_beta (float) -- The beta value for the F-score. Defaults to 1.0.

  • chrf_char_order (int) -- The order of characters. Defaults to sacrebleu.CHRF.CHAR_ORDER.

  • chrf_word_order (int) -- The order of words. Defaults to sacrebleu.CHRF.WORD_ORDER.

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.chrF(cfg)#

基类:MetricsBase

The chrF metric.

class flexrag.metrics.F1(cfg)[源代码]#

基类:MatchingMetrics

F1 metric computes the F1 score of the predicted response against the golden responses.

class flexrag.metrics.Accuracy(cfg)[源代码]#

基类:MatchingMetrics

Accuracy metric computes if any of the golden responses is in the predicted response.

class flexrag.metrics.ExactMatch(cfg)[源代码]#

基类:MatchingMetrics

ExactMatch metric computes if any of the golden responses is exactly the same as the predicted response.

class flexrag.metrics.Precision(cfg)[源代码]#

基类:MatchingMetrics

Precision metric computes the precision of the predicted response against the golden responses.

class flexrag.metrics.Recall(cfg)[源代码]#

基类:MatchingMetrics

Recall metric computes the recall of the predicted response against the golden responses.

Information Retrieval Metrics#

class flexrag.metrics.SuccessRateConfig(eval_field=None, simplify=True)[源代码]#

Configuration for SuccessRate metric. This metric computes whether the retrieved contexts contain any of the golden responses.

参数:
  • eval_field (Optional[str]) -- The field to evaluate. Defaults to None. If None, only strings are supported as the retrieved_contexts.

  • simplify (bool) -- Whether to simplify the retrieved contexts. Defaults to True.

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.SuccessRate(cfg)[源代码]#

基类:MetricsBase

The SuccessRate metric computes whether the retrieved contexts contain any of the golden responses.

class flexrag.metrics.RetrievalRecallConfig(k_values=<factory>)[源代码]#

Configuration for RetrievalRecall metric. This metric computes the recall of the retrieved contexts. The computation is based on pytrec_eval.

参数:

k_values (list[int]) -- The k values for evaluation. Defaults to [1, 5, 10].

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.RetrievalRecall(cfg)[源代码]#

基类:MetricsBase

The RetrievalRecall metric computes the recall of the retrieved contexts.

class flexrag.metrics.RetrievalPrecisionConfig(k_values=<factory>)[源代码]#

Configuration for RetrievalPrecision metric. This metric computes the precision of the retrieved contexts. The computation is based on pytrec_eval.

参数:

k_values (list[int]) -- The k values for evaluation. Defaults to [1, 5, 10].

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.RetrievalPrecision(cfg)[源代码]#

基类:MetricsBase

The RetrievalPrecision metric computes the precision of the retrieved contexts.

class flexrag.metrics.RetrievalMAPConfig(k_values=<factory>)[源代码]#

Configuration for RetrievalMAP metric. This metric computes the MAP of the retrieved contexts. The computation is based on pytrec_eval.

参数:

k_values (list[int]) -- The k values for evaluation. Defaults to [1, 5, 10].

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.RetrievalMAP(cfg)[源代码]#

基类:MetricsBase

The RetrievalMAP metric computes the Mean Average Precision (MAP) of the retrieved contexts.

class flexrag.metrics.RetrievalNDCGConfig(k_values=<factory>)[源代码]#

Configuration for RetrievalNDCG metric. This metric computes the nDCG of the retrieved contexts. The computation is based on pytrec_eval.

参数:

k_values (list[int]) -- The k values for evaluation. Defaults to [1, 5, 10].

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.metrics.RetrievalNDCG(cfg)[源代码]#

基类:MetricsBase

The RetrievalNDCG metric computes the Normalized Discounted Cumulative Gain (nDCG) of the retrieved contexts.