π QuickstartΒΆ
OverviewΒΆ
This page helps you get MMORE running quickly with a minimal workflow.
The goal is not to cover every configuration option, but to give you a first successful setup and a clear mental model of the main steps.
What this quickstart coversΒΆ
In a typical MMORE workflow, you will:
install the project and its dependencies
prepare a small document collection
process the collection
build an index
run retrieval or a simple RAG workflow
Before you startΒΆ
Make sure you have already read Installation.
You should also confirm that:
your environment is activated
project dependencies are installed
you are working on a small test collection first
Minimal workflowΒΆ
The exact commands depend on your repository entry points, but the overall workflow is the following.
1. Prepare a small collectionΒΆ
Start with a small and simple document set before moving to large-scale or distributed workloads.
For example, create a folder containing a few representative documents:
sample_data/
βββ doc1.pdf
βββ doc2.pdf
βββ doc3.html
βββ doc4.md
2. Run document processingΒΆ
Processing transforms raw documents into a form that MMORE can index and retrieve from.
Depending on your setup, this step may include:
parsing files
extracting text and metadata
chunking content
preparing multimodal representations
See Processing pipeline for the detailed logic.
3. Build an indexΒΆ
Once documents are processed, create an index so they can be searched efficiently.
This step usually includes:
selecting the indexing backend or strategy
generating representations for chunks or documents
storing the resulting index artifacts
See Indexing for the full indexing workflow.
4. Run retrievalΒΆ
After indexing, you can test retrieval on a few example queries.
At this stage, you want to verify simple things:
does the system return relevant documents?
are the retrieved chunks meaningful?
is the ranking roughly coherent?
5. Move to RAG if neededΒΆ
If your workflow includes generation, retrieval results can then be passed into a RAG pipeline.
See RAG for how retrieval and generation are combined.
Example end-to-end flowΒΆ
Conceptually, a first MMORE run looks like this:
Raw documents
β
Processing
β
Structured outputs / chunks / metadata
β
Indexing
β
Retrieval
β
Optional RAG generation
Recommended first checksΒΆ
After your first run, verify the following:
documents were correctly discovered and parsed
processed outputs were actually generated
the index was created where expected
simple test queries return results
retrieved content looks coherent and relevant
Common mistakesΒΆ
Warning
Do not start with a large or noisy collection.
When debugging a documentation-backed pipeline, a very small dataset is much easier to inspect and validate.
Typical first-run problems include:
wrong environment or missing dependencies
input paths that do not point to the expected collection
outputs written to a different directory than expected
indexing performed on incomplete processed data
retrieval tested before the index is fully built
Where to go nextΒΆ
After this page, the best next steps are:
Architecture to understand the big picture
Processing pipeline for ingestion and transformations
Indexing for indexing details
RAG for retrieval-augmented generation