Latest submissions

Pre-submission / Working document
05/14/2024
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Pierre Colombo, Kevin El Haddad, Céline Hudelot, Nicolas Boizard
Pre-submission / Working document
05/14/2024
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse, Patrick Fernandes, Nuno Guerreiro, Antonio Loison, Duarte Alves, Caio Corro, Nicolas Boizard, Jaoe Alves, Ricardo Rei, Pedro Raphaël Martins, Antoni Casademunt, François Yvon, André Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo
Pre-submission / Working document
05/14/2024
TOWER: An Open Multilingual Large Language Model for Translation-Related Tasks
Pierre Colombo, Duarte Alves, José Pombal, Nuno Guerreiro, Pedro Martins, Joao Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sveta Agrawal, José De, André Martins
Pre-submission / Working document
05/14/2024
Towards Trustworthy Reranking: A Simple yet Effective Abstention Mechanism
Pierre Colombo, Hippolyte Gisserot-Boukhlef, Manuel Faysse, Emmanuel Malherbe, Céline Hudelot
Pre-submission / Working document
05/14/2024
SaulLM-7B: A pioneering Large Language Model for Law
Pierre Colombo, Michael Desa, Telmo Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, André Martins, Fabrizio Esposito, Vera Raposo, Sofia Morgado
Pre-submission / Working document
05/14/2024
Towards More Robust NLP System Evaluation: Handling Missing Scores in Benchmarks
Pierre Colombo, Anas Himmi, Ekhine Irurozki, Nathan Noiry, Stéphan Clémençon
Pre-submission / Working document
05/13/2024
Transductive Learning for Textual Few-Shot Classification in API-based Embedding Models
Pierre Colombo, Victor Pellegrain, Malik Boudiaf, Victor Storchan, Myriam Tami, Ismail Ben Ayed, Céline Hudelot, Pablo Piantanida
Communication on a congress
05/13/2024

Pages