Retrievers#

Retrievers are used to retrieve data from the local knowledge base or the web.

The Retriever Interface#

RetrieverBase is the base class for all retrievers, including the subclasses of EditableRetriever and WebRetrieverBase.

RetrieverConfig is the general configuration for all registered retrievers. You can load any retriever by specifying the retriever name in the configuration. For example, to load the pre-built FlexRetriever retriever, you can use the following configuration:

from flexrag.retriever import RetrieverConfig, RETRIEVERS, FlexRetrieverConfig

config = RetrieverConfig(
    retriever_type='flex',
    flex_config=FlexRetrieverConfig(
        retriever_path='<path_to_retriever>',
    )
)
retriever = RETRIEVERS.load(config)

Editable Retriever#

Retriever Index#

RetrieverIndex is used in FlexRetriever to store and retrieve dense embeddings.

RetrieverConfig is the general configuration for all registered RetrieverIndexes. You can load any RetrieverIndex by specifying the index_type in the configuration. For example, to load the BM25Index, you can use the following configuration:

from flexrag.retriever.index import RetrieverIndexConfig, RETRIEVER_INDEX, BM25IndexConfig

config = RetrieverIndexConfig(
    index_type='bm25',
    bm25_config=BM25IndexConfig(
        index_path='<path_to_index>',
    )
)
index = RETRIEVER_INDEX.load(config)

Web Retriever#

WebRetriever is used to retrieve data from the web. Different from the EditableRetriever, web retrievers can be used without building a knowledge base, as they retrieve data using web search engines.

FlexRAG provides two simple web retrievers, SimpleWebRetriever and WikipediaRetriever.

Web Seeker#

WebSeeker is used to search the resources from the web for the given query. The web resources could be sought by walking through a set of given web pages, by using a search engine, etc. FlexRAG provides several web seekers using existing search engines.

WebSeekerConfig is the general configuration for all registered WebSeekers. You can load any WebSeekers by specifying the web_seeker_type in the configuration. For example, to load the DuckDuckGoEngine, you can use the following configuration:

from flexrag.retriever.web_retrievers import WebSeekerConfig, WEB_SEEKERS

config = WebSeekerConfig(
    web_seeker_type='ddg',
)
seeker = WEB_SEEKERS.load(config)

SearchEngine is a type of WebSeeker that searches for web resources by leveraging existing search engines. SearchEngineConfig is the general configuration for all registered SearchEngines. You can load any SearchEngines by specifying the search_engine_type in the configuration. For example, to load the DuckDuckGoEngine, you can use the following configuration:

from flexrag.retriever.web_retrievers import SearchEngineConfig, SEARCH_ENGINES

config = SearchEngineConfig(
    search_engine_type='ddg',
)
seeker = SEARCH_ENGINES.load(config)

Web Downloader#

Web downloader is used to download data from the web.

Web Reader#

Web reader is used to convert web data into LLM friendly format.

WebReaderConfig is the general configuration for all registered WebReaders. You can load any WebReader by specifying the web_reader_type in the configuration. For example, to load the JinaReader, you can use the following configuration:

from flexrag.retriever.web_retrievers import WebReaderConfig, WEB_READERS

config = WebReaderConfig(
    web_reader_type='jina_reader',
)
seeker = WEB_READERS.load(config)