MCIF documentation

_images/mcif.png

MCIF is a comprehensive benchmark for evaluating multimodal, crosslingual instruction-following systems, which covers 3 modalities (text, speech, and video), 4 languages (English, German, Italian, and Chinese), and 13 tasks (organized into 4 macro-tasks).

Check out the Usage section for instructions on how to use the repository and the Installation section for further information about how to install the project.

Credits

The library is released open source under Apache 2.0 License. If you use this library, please cite:

@misc{papi2025mcifmultimodalcrosslingualinstructionfollowing,
    title={{MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks}},
    author={Sara Papi and Maike Züfle and Marco Gaido and Beatrice Savoldi and Danni Liu and Ioannis Douros and Luisa Bentivogli and Jan Niehues},
    year={2025},
    eprint={2507.19634},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2507.19634},
}