ποΈ IndexingΒΆ
OverviewΒΆ
The index module handles the indexing and post-processing of data extracted from multimodal documents.
It builds an indexed vector store based on Milvus and supports hybrid retrieval, combining both dense and sparse retrieval.
Different parts of the indexing pipeline can be customized through an inference indexing configuration file.
π‘ TL;DRΒΆ
The indexing workflow takes processed documents and turns them into searchable artifacts that can later be used for retrieval and RAG pipelines.
In practice, this means:
loading processed document data
generating dense and sparse representations
storing them in a Milvus-based vector store
preparing the collection for hybrid retrieval
π» Minimal Example:ΒΆ
Here is a minimal example to index processed documents.
1. Create a config fileΒΆ
Start from the example configuration file: examples/index/config.yaml.
Adjust it to match your setup and indexing needs.
2. Run the indexing commandΒΆ
Once the configuration file is ready, launch the indexing pipeline with:
python3 -m mmore index --config_file /path/to/config.yaml
NotesΒΆ
The indexing step assumes that your documents have already been processed.
If you have not done that yet, start with Process.