🗂️ Indexing¶

Overview¶

The index module handles the indexing and post-processing of data extracted from multimodal documents.

It builds an indexed vector store based on Milvus and supports hybrid retrieval, combining both dense and sparse retrieval.

Different parts of the indexing pipeline can be customized through an inference indexing configuration file.

The indexing workflow takes processed documents and turns them into searchable artifacts that can later be used for retrieval and RAG pipelines.

In practice, this means:

Here is a minimal example to index processed documents.

Start from the example configuration file: examples/index/config.yaml.

Adjust it to match your setup and indexing needs.

Once the configuration file is ready, launch the indexing pipeline with:

python3 -m mmore index --config_file /path/to/config.yaml

The indexing step assumes that your documents have already been processed.

If you have not done that yet, start with Process.