Document Parser#

This module provides a set of classes and functions for parsing a formated document (such as PDF, Word, etc.) into a structured format.

class flexrag.document_parser.Document(source_file_path, title=None, text=None, screenshots=<factory>, images=<factory>)[源代码]#

A document parsed by a DocumentParser.

dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.document_parser.DocumentParserBase[源代码]#
abstract parse(document_path)[源代码]#

Parse the document at the given path.

参数:

document_path (str) -- The path to the document to parse.

返回:

The parsed document.

返回类型:

Document

class flexrag.document_parser.DoclingConfig(do_ocr=False, do_table_structure=True, generate_page_images=False, generate_picture_images=False)[源代码]#
dump(path)#

Dump the dataclass to a YAML file.

dumps()#

Dump the dataclass to a YAML string.

classmethod load(path)#

Load the dataclass from a YAML file.

classmethod loads(s)#

Load the dataclass from a YAML string.

class flexrag.document_parser.DoclingParser(config)[源代码]#

基类:DocumentParserBase

class flexrag.document_parser.MarkItDownParser[源代码]#

基类:DocumentParserBase