πŸ’» Developer DocumentationΒΆ

Welcome to the MMORE developer documentation!
This guide will help you set up your development environment and contribute to the project.

Table of ContentsΒΆ


πŸ› οΈ Development setupΒΆ

System dependenciesΒΆ

Before installing MMORE for development, ensure you have the required system dependencies installed.

Linux (Ubuntu/Debian)ΒΆ

sudo apt update
sudo apt install -y ffmpeg libsm6 libxext6 chromium-browser libnss3 \
  libgconf-2-4 libxi6 libxrandr2 libxcomposite1 libxcursor1 libxdamage1 \
  libxext6 libxfixes3 libxrender1 libasound2 libatk1.0-0 libgtk-3-0 libreoffice \
  libpango-1.0-0 libpangoft2-1.0-0 weasyprint

Note

On Ubuntu 24.04, replace libasound2 with libasound2t64.

You may also need to add the Ubuntu 20.04 focal repository to access some packages, for example by creating /etc/apt/sources.list.d/mmore.list with:

deb http://cz.archive.ubuntu.com/ubuntu focal main universe

macOSΒΆ

brew update
brew install ffmpeg chromium gtk+3 pango cairo \
  gobject-introspection libffi pkg-config libx11 libxi \
  libxrandr libxcomposite libxcursor libxdamage libxext \
  libxrender libasound2 atk libreoffice weasyprint

If weasyprint fails to find GTK or Cairo, also run:

brew install cairo pango gdk-pixbuf libffi
uv pip install weasyprint

Installing MMORE for developmentΒΆ

1. Clone the repositoryΒΆ

git clone https://github.com/swiss-ai/mmore.git
cd mmore

2. Create a virtual environment and install dependenciesΒΆ

uv venv .venv
source .venv/bin/activate
uv pip install -e ".[all,cpu,dev]"

Note

For GPU (CUDA 12.6), replace cpu with cu126, for example:

uv pip install -e ".[all,cu126,dev]"

Note

For a partial install, replace all with only the stages you need, for example:

uv pip install -e ".[rag,cpu,dev]"

Available stages are: process, index, rag, and api.

Important

This package requires many large dependencies and a dependency override, so it should be installed with uv rather than plain pip.

See the uv guide for more information.

🧹 Code quality tools¢

MMORE uses several tools to maintain code quality and consistency.

Pre-commit hooksΒΆ

We use pre-commit to automatically run code formatters and linters before each commit.

SetupΒΆ

1. Install pre-commitΒΆ
uv pip install pre-commit
2. Set up the git hook scriptsΒΆ
pre-commit install
3. Run the checks manuallyΒΆ

Optional but recommended before your first commit.

pre-commit run --all-files

Configured HooksΒΆ

The pre-commit configuration runs ruff, a code formatter for consistent style.

Type CheckingΒΆ

We use pyright for static type checking.
Please ensure your pull requests are type-checked before submission.

To run type checking manually:

pyright

🀝 Contributing Guidelines¢

We welcome contributions! Here’s how you can help:

Reporting IssuesΒΆ

  • Bug reports: open an issue with a clear description, steps to reproduce, and expected vs. actual behavior

  • Feature requests: open an issue describing the feature, its use case, and potential implementation approach

  • Check the Issues page for ongoing work

Code ContributionsΒΆ

  1. Fork the repository and create a new branch for your feature/fix

  2. Write clear, documented code following the existing style

  3. Add tests if applicable

  4. Ensure all pre-commit hooks pass

  5. Run type checking with pyright

  6. Submit a Pull Request with a clear description

πŸ—‚οΈ Project StructureΒΆ

mmore/ β”œβ”€β”€ mmore/ β”‚ β”œβ”€β”€ process/ # Document processing pipeline β”‚ β”‚ β”œβ”€β”€ processors/ # Individual file type processors β”‚ β”‚ └── … β”‚ β”œβ”€β”€ postprocess/ # Post-processing utilities β”‚ β”œβ”€β”€ index/ # Indexing and vector DB β”‚ β”œβ”€β”€ rag/ # RAG implementation β”‚ └── type/ # Type definitions and data models β”œβ”€β”€ docs/ # Documentation β”œβ”€β”€ examples/ # Example configurations and data β”œβ”€β”€ tests/ # Test suite β”œβ”€β”€ .pre-commit-config.yaml β”œβ”€β”€ pyproject.toml └── README.md

Key ModulesΒΆ

  • mmore.process: Handles extraction from various file formats

  • mmore.index: Manages hybrid dense+sparse indexing with Milvus

  • mmore.rag: RAG system with LangChain integration

  • mmore.type: Core data structures like MultimodalSample

πŸ§ͺ TestingΒΆ

Running tests in the terminalΒΆ

pytest tests/

GPU testsΒΆ

Tests requiring a CUDA GPU are marked @pytest.mark.gpu and skipped by default. Pass --gpu to run them:

pytest --gpu          # full suite, including GPU tests
pytest --gpu -m gpu   # only the GPU-marked tests

To mark a new GPU-only test:

import pytest

@pytest.mark.gpu
def test_something_on_gpu():
    ...

Writing testsΒΆ

  • Place tests in the tests/ directory

  • Use descriptive test names

  • Cover edge cases and error conditions

  • Mock external dependencies when appropriate

  • Mark GPU-only tests with @pytest.mark.gpu (see above)

πŸ”€ Pull Request ProcessΒΆ

  1. Update documentation if you’re adding new features

  2. Add examples for new functionality

  3. Ensure all tests pass and pre-commit hooks succeed

  4. Update the changelog if applicable

  5. Request review from maintainers

PR checklistΒΆ

  • [ ] Code follows project style guidelines

  • [ ] Pre-commit hooks pass (pre-commit run --all-files)

  • [ ] Type checking passes (pyright)

  • [ ] Tests are added or updated as needed

  • [ ] Documentation is updated

  • [ ] Examples are provided for new features

  • [ ] Commit messages are clear and descriptive

πŸ’‘ Development tipsΒΆ

Working with uvΒΆ

  • Use uv pip instead of pip for all package installations

  • The project uses dependency overrides that are handled automatically by uv

  • See the uv tutorial for more details

❓ QuestionsΒΆ

If you have questions about contributing, feel free to:

  • Open a discussion on GitHub

  • Reach out to the maintainers

  • Check existing issues for similar questions

Thank you for contributing to MMORE! πŸŽ‰