Context Refiner#

The context refiner is responsible for refining the contexts retrieved by the retriever. It can be used to rearrange the contexts, summarize them, or extract the most relevant information from them.

The Context Refiner Interface#

The RefinerBase is the base class for all refiners. It provides the basic interface for refining the contexts retrieved by the retriever.

class flexrag.context_refine.RefinerBase[源代码]#

The base class for context refiners. The subclasses should implement the refine method.

abstract refine(contexts)[源代码]#

Refine the contexts.

参数:

contexts (list[RetrievedContext]) -- The retrieved contexts to refine.

返回:

The refined contexts.

返回类型:

list[RetrievedContext]

Refiners#

FlexRAG provides several refiners that can be used to refine the contexts retrieved by the retriever.

class flexrag.context_refine.ContextArrangerConfig(order='ascending')[源代码]#

The configuration for the ContextArranger.

参数:

order (str) -- The order to arrange the contexts. Defaults to "ascending". available choices: "ascending", "descending", "side", "random".

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.context_refine.ContextArranger(config)[源代码]#

基类:RefinerBase

The ContextArranger arranges the contexts based on the given order.

As the lost-in-the-middle problem encountered by the LLMs, the order of the contexts may affect the performance. This refiner helps to arrange the contexts in a specific order.

refine(**kwargs)#

Refine the contexts.

参数:

contexts (list[RetrievedContext]) -- The retrieved contexts to refine.

返回:

The refined contexts.

返回类型:

list[RetrievedContext]

class flexrag.context_refine.AbstractiveSummarizerConfig(generator_type=None, anthropic_config=<factory>, hf_config=<factory>, hf_vlm_config=<factory>, ollama_config=<factory>, openai_config=<factory>, vllm_config=<factory>, template=None, chat_prompt=None, substitute=True, concatenate_contexts=False, refined_field=None)[源代码]#

The configuration for the AbstractiveSummarizer.

参数:
  • template (Optional[str]) -- The template used to form the input text for the generator. Defaults to None. The template should be a Python string.Template object. The supported keys for the template are: [content, query].

  • chat_prompt (Optional[ChatPrompt]) -- The chat prompt for the generator. Defaults to None. Only used when the generator is a chat-based generator.

  • substitute (bool) -- Whether to substitute the original text with the summary. Defaults to True. If False, the summary will be stored in a new field named as refined_field + "_summary".

  • concatenate_contexts (bool) -- Whether to concatenate the contexts into one text. Defaults to False.

  • refined_field (str) -- The field to refine. Required.

The AbstractiveSummarizer supports multiple styles of summarizers, including T5, RECOMP, and LLM. For example, to summarize the contexts using a T5 style summarizer, you can run the following code:

cfg = AbstractiveSummarizerConfig(
    template="summarize: ${content}",
    generator_type="hf",
    refined_field="text",
    hf_config=HFGeneratorConfig(
        model_path="google-t5/t5-small",
        model_type="seq2seq",
    )
)
summarizer = AbstractiveSummarizer(cfg)

To summarize the contexts using a RECOMP style summarizer, you can run the following code:

cfg = AbstractiveSummarizerConfig(
    template="Question: ${query}\n Document: ${content}\n Summary: ",
    generator_type="hf",
    refined_field="text",
    hf_config=HFGeneratorConfig(
        model_path="fangyuan/hotpotqa_abstractive_compressor",
        model_type="seq2seq",
    )
)
summarizer = AbstractiveSummarizer(cfg)

To summarize the contexts using a LLM style summarizer, you can run the following code:

cfg = AbstractiveSummarizerConfig(
    refined_field="text",
    template="Query: ${query}\nText: ${content}",
    chat_prompt=ChatPrompt(
        system="You are a skillful summarizer. Please summarize the following text based on given query.",
    ),
    generator_type="openai",
    openai_config=OpenAIGeneratorConfig(api_key=api_key, model_name="gpt-3.5-turbo")
)
summarizer = AbstractiveSummarizer(cfg)
dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.context_refine.AbstractiveSummarizer(cfg)[源代码]#

基类:RefinerBase

The AbstractiveSummarizer summarizes the contexts using a generator.

refine(**kwargs)#

Refine the contexts.

参数:

contexts (list[RetrievedContext]) -- The retrieved contexts to refine.

返回:

The refined contexts.

返回类型:

list[RetrievedContext]

class flexrag.context_refine.RecompExtractiveSummarizerConfig(encoder_type=None, cohere_config=<factory>, hf_config=<factory>, hf_clip_config=<factory>, jina_config=<factory>, ollama_config=<factory>, openai_config=<factory>, sentence_transformer_config=<factory>, preserved_sents=5, concatenate_contexts=False, substitute=False, refined_field=None)[源代码]#

The configuration for the RecompExtractiveSummarizer.

参数:
  • preserved_sents (int) -- The number of sentences to preserve. Defaults to 5.

  • concatenate_contexts (bool) -- Whether to concatenate the contexts into one text. Defaults to False.

  • substitute (bool) -- Whether to substitute the original text with the summary. Defaults to False.

  • refined_field (str) -- The field to refine. Required.

The RecompExtractiveSummarizer is motivated by the RECOMP (https://arxiv.org/abs/2310.04408). For example, to load a summarizer trained on hotpotqa dataset, you can run the following code:

cfg = RecompExtractiveSummarizerConfig(
    encoder_type="hf",
    hf_config=HFEncoderConfig(
        model_path="fangyuan/hotpotqa_extractive_compressor",
    ),
    preserved_sents=5,
    refined_field="text",
)
summarizer = RecompExtractiveSummarizer(cfg)
dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.context_refine.RecompExtractiveSummarizer(cfg)[源代码]#

基类:RefinerBase

The ExtractiveSummarizer summarizes the contexts using an encoder.

refine(**kwargs)#

Refine the contexts.

参数:

contexts (list[RetrievedContext]) -- The retrieved contexts to refine.

返回:

The refined contexts.

返回类型:

list[RetrievedContext]