Generators#
- class flexrag.models.GeneratorBase[source]#
- async async_chat(prompts, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
The async version of chat.
- async async_generate(prefixes, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
The async version of generate.
- chat(prompts, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
chat with the model using model templates.
- Parameters:
prompts (list[ChatPrompt] | list[list[dict]] | ChatPrompt | list[dict]) – A batch of ChatPrompts.
generation_config (GenerationConfig) – GenerationConfig. Defaults to GenerationConfig().
- Returns:
A batch of chat responses.
- Return type:
list[list[str]]
- generate(prefixes, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
generate text with the model using the given prefixes.
- Parameters:
prefixes (list[str] | str) – A batch of prefixes.
generation_config (GenerationConfig) – GenerationConfig. Defaults to GenerationConfig().
- Returns:
A batch of generated text.
- Return type:
list[list[str]]
- class flexrag.models.GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=<factory>)[source]#
Configuration for text generation.
- Parameters:
do_sample (bool) – Whether to use sampling for generation. Defaults to True.
sample_num (int) – The number of samples to generate. Defaults to 1.
temperature (float) – The temperature of the sampling distribution. Defaults to 1.0.
max_new_tokens (int) – The maximum number of tokens to generate. Defaults to 512.
top_p (float) – The cumulative probability for nucleus sampling. Defaults to 0.9.
top_k (int) – The number of tokens to consider for top-k sampling. Defaults to 50.
eos_token_id (Optional[int]) – The token id for the end of sentence token. Defaults to None.
stop_str (list[str]) – A list of strings to stop generation. Defaults to [].
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.GeneratorConfig(generator_type=None, anthropic_config=<factory>, hf_config=<factory>, hf_vlm_config=<factory>, ollama_config=<factory>, openai_config=<factory>, vllm_config=<factory>)#
Configuration class for generator (name: GeneratorConfig, default: None).
- Parameters:
generator_type (str) – The generator type to use.
anthropic_config (AnthropicGeneratorConfig) – The config for AnthropicGenerator.
hf_config (HFGeneratorConfig) – The config for HFGenerator.
hf_vlm_config (HFVLMGeneratorConfig) – The config for HFVLMGenerator.
ollama_config (OllamaGeneratorConfig) – The config for OllamaGenerator.
openai_config (OpenAIGeneratorConfig) – The config for OpenAIGenerator.
vllm_config (VLLMGeneratorConfig) – The config for VLLMGenerator.
Local Generators#
- class flexrag.models.HFModelConfig(model_path=None, tokenizer_path=None, trust_remote_code=False, device_id=<factory>, load_dtype='auto')[source]#
The Base Configuration for Huggingface Models, including HFGenerator, HFVLMGenerator, HFEncoder and HFClipEncoder.
- Parameters:
model_path (str) – The path to the model. Required.
tokenizer_path (Optional[str]) – The path to the tokenizer. None for the same as model_path. Default is None.
trust_remote_code (bool) – Whether to trust remote code. Default is False.
device_id (list[int]) – The device id to use. [] for using CPU. Default is [].
load_dtype (str) – The dtype to load the model. Default is “auto”. Available choices are “bfloat16”, “bf16”, “float32”, “fp32”, “float16”, “fp16”, “half”, “8bit”, “4bit”, “auto”,
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.HFGeneratorConfig(model_path=None, tokenizer_path=None, trust_remote_code=False, device_id=<factory>, load_dtype='auto', pipeline_parallel=False, use_minference=False, model_type='causal_lm')[source]#
Bases:
HFModelConfigConfiguration for HFGenerator.
- Parameters:
pipeline_parallel (bool) – Whether to use pipeline parallel. Default is False.
use_minference (bool) – Whether to use minference for long sequence inference. Default is False.
model_type – The type of the model. Default is “causal_lm”. Available choices are “causal_lm”, “seq2seq”.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.HFGenerator(cfg)[source]#
Bases:
GeneratorBase
- class flexrag.models.OllamaGeneratorConfig(model_name=None, base_url='http://localhost:11434/', verbose=False, num_ctx=4096, allow_parallel=True)[source]#
Configuration for the OllamaGenerator.
- Parameters:
model_name (str) – The name of the model to use. Required.
base_url (str) – The base URL of the Ollama server. Default is ‘http://localhost:11434/’.
verbose (bool) – Whether to show verbose logs. Default is False.
num_ctx (int) – The number of context tokens to use. Default is 4096.
allow_parallel (bool) – Whether to allow parallel generation. Default is True.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.OllamaGenerator(cfg)[source]#
Bases:
GeneratorBase
- class flexrag.models.VLLMGeneratorConfig(model_path=None, gpu_memory_utilization=0.85, max_model_len=16384, tensor_parallel=1, load_dtype='auto', use_minference=False, trust_remote_code=False)[source]#
Configuration for VLLMGenerator.
- Parameters:
model_path (str) – Path to the model. Required.
gpu_memory_utilization (float) – Fraction of GPU memory to use. Default to 0.85.
max_model_len (int) – Maximum length of the model. Defaults to 16384.
tensor_parallel (int) – The number of tensor parallel. Defaults to 1.
load_dtype (str) – The dtype to load the model. Defaults to “auto”. Available options are “auto”, “float32”, “float16”, “bfloat16”.
use_minference (bool) – Whether to use minference for Long Sequence Inference. Defaults to False.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.VLLMGenerator(cfg)[source]#
Bases:
GeneratorBase
Online Generators#
- class flexrag.models.AnthropicGeneratorConfig(model_name=None, base_url=None, api_key='EMPTY', verbose=False, proxy=None, allow_parallel=True)[source]#
Configuration for AnthropicGenerator.
- Parameters:
model_name (str) – The name of the model. Required.
base_url (Optional[str]) – The base url of the API. Defaults to None.
api_key (str) – The API key. Defaults to os.environ.get(“ANTHROPIC_API_KEY”, “EMPTY”).
verbose (bool) – Whether to output verbose logs. Defaults to False.
proxy (Optional[str]) – The proxy to use. Defaults to None.
allow_parallel (bool) – Whether to allow parallel generation. Defaults to True.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.AnthropicGenerator(cfg)[source]#
Bases:
GeneratorBase
- class flexrag.models.OpenAIConfig(is_azure=False, model_name=None, base_url=None, api_key='EMPTY', api_version='2024-07-01-preview', verbose=False, proxy=None)[source]#
Bases:
objectThe Base Configuration for OpenAI Client.
- Parameters:
is_azure (bool) – Whether the model is hosted on Azure. Default is False.
model_name (str) – The name of the model to use.
base_url (Optional[str]) – The base URL of the OpenAI API. Default is None.
api_key (str) – The API key for OpenAI. Default is os.environ.get(“OPENAI_API_KEY”, “EMPTY”).
api_version (str) – The API version to use. Default is “2024-07-01-preview”.
verbose (bool) – Whether to show verbose logs. Default is False.
proxy (Optional[str]) – The proxy to use for the HTTP client. Default is None.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.OpenAIGeneratorConfig(is_azure=False, model_name=None, base_url=None, api_key='EMPTY', api_version='2024-07-01-preview', verbose=False, proxy=None, allow_parallel=True)[source]#
Bases:
OpenAIConfigConfiguration for OpenAI Generator.
- Parameters:
allow_parallel (bool) – Whether to allow parallel generation. Default is True.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.OpenAIGenerator(cfg)[source]#
Bases:
GeneratorBase
Visual Language Model Generators#
- class flexrag.models.VLMGeneratorBase[source]#
Bases:
GeneratorBase- async async_chat(prompts, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
The async version of chat.
- async async_generate(prefixes, images, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
The async version of generate.
- chat(prompts, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
chat with the model using model templates.
- Parameters:
prompts (list[MultiModelChatPrompt] | list[list[dict]] | MultiModelChatPrompt | list[dict]) – A batch of MultiModelChatPrompts.
generation_config (GenerationConfig) – GenerationConfig. Defaults to GenerationConfig().
- Returns:
A batch of chat responses.
- Return type:
list[list[str]]
- generate(prefixes, images, generation_config=GenerationConfig(do_sample=True, sample_num=1, temperature=1.0, max_new_tokens=512, top_p=0.9, top_k=50, eos_token_id=None, stop_str=[]))[source]#
generate text with the model using the given prefixes.
- Parameters:
prefixes (list[str]) – A batch of prefixes.
images (list[Image]) – A batch of images.
generation_config (GenerationConfig) – GenerationConfig. Defaults to GenerationConfig().
- Returns:
A batch of generated text.
- Return type:
list[list[str]]
- class flexrag.models.HFVLMGeneratorConfig(model_path=None, tokenizer_path=None, trust_remote_code=False, device_id=<factory>, load_dtype='auto', pipeline_parallel=False)[source]#
Bases:
HFModelConfigConfiguration for HFVLMGenerator.
- Parameters:
pipeline_parallel (bool) – Whether to use pipeline parallel. Default is False.
- dump(path)#
Dump the dataclass to a YAML file.
- dumps()#
Dump the dataclass to a YAML string.
- classmethod load(path)#
Load the dataclass from a YAML file.
- classmethod loads(s)#
Load the dataclass from a YAML string.
- class flexrag.models.HFVLMGenerator(cfg)[source]#
Bases:
VLMGeneratorBase