πŸ—‚οΈ IndexingΒΆ

OverviewΒΆ

The index module handles the indexing and post-processing of data extracted from multimodal documents.

It builds an indexed vector store based on Milvus and supports hybrid retrieval, combining both dense and sparse retrieval.

Different parts of the indexing pipeline can be customized through an inference indexing configuration file.

πŸ’‘ TL;DRΒΆ

The indexing workflow takes processed documents and turns them into searchable artifacts that can later be used for retrieval and RAG pipelines.

In practice, this means:

  • loading processed document data

  • generating dense and sparse representations

  • storing them in a Milvus-based vector store

  • preparing the collection for hybrid retrieval

πŸ’» Minimal Example:ΒΆ

Here is a minimal example to index processed documents.

1. Create a config fileΒΆ

Start from the example configuration file: examples/index/config.yaml.

Adjust it to match your setup and indexing needs.

2. Run the indexing commandΒΆ

Once the configuration file is ready, launch the indexing pipeline with:

python3 -m mmore index --config_file /path/to/config.yaml

NotesΒΆ

The indexing step assumes that your documents have already been processed.

If you have not done that yet, start with Process.

See alsoΒΆ