Daily MT Picks

Subscribe
Archives
July 12, 2025

Machine Translation Digest for Jul 07 2025

Here is today's selection of cs.CL papers exploring advancements in language models and multilingual processing. The papers delve into domain adaptation strategies, pun generation techniques, and the semantic understanding of large language models, while also addressing challenges in historical text transcription and multilingual reasoning.


O_FT@EvalLLM2025 : étude comparative de choix de données et de stratégies d'apprentissage pour l'adaptation de modèles de langue à un domaine

This paper presents the work carried out by the O_FT team, joint with Orange and Ouest-France, on adapting language models to the defense domain as part of the EvalLLM2025 challenge. This work focused on adapting the \texttt{Mistral-7B-Instruct-v0.3} model using classical techniques of continued pre-training and instruction-tuning. The core of our efforts is based on collecting, generating, and selecting data for these two stages as well as for model evaluation. Experiments show that our adapted models have better domain-specific knowledge and improved domain-specific task processing skills, along with comparable (or even superior) performance on general knowledge and skills. Considering the carbon footprint of our adaptations, this work demonstrates the feasibility of domain adaptation for relatively small models. -- Ce document pr\'esente les travaux r\'ealis\'es par l'\'equipe O_FT conjointe `a Orange et Ouest-France sur l'adaptation de mod`eles de langue au domaine de la d\'efense dans le cadre du challenge EvalLLM2025. Ces travaux se sont concentr\'es sur l'adaptation du mod`ele \texttt{Mistral-7B-Instruct-v0.3} avec des techniques classiques de poursuite du pr\'e-entra\^inement et d'affinage sur instructions. L'essentiel de nos travaux a port\'e sur la constitution, g\'en\'eration et s\'election de donn\'ees pour ces deux \'etapes ainsi que pour l'\'evaluation des mod`eles. Les exp\'eriences montrent que nos mod`eles adapt\'es ont de meilleures de connaissances de fond et une meilleure capacit\'e de traitement de t\^aches sur le domaine de la d\'efense, ainsi que des performances comparables (voire sup\'erieures) sur des connaissances ou capacit\'es g\'en\'eralistes. Mis au regard des empreintes carbones de nos adaptations, ces travaux d\'emontrent ainsi la viabilit\'e de l'adaptation `a un domaine de mod`eles relativement petits.


A Survey of Pun Generation: Datasets, Evaluations and Methodologies

Pun generation seeks to creatively modify linguistic elements in text to produce humour or evoke double meanings. It also aims to preserve coherence and contextual appropriateness, making it useful in creative writing and entertainment across various media and contexts. Although pun generation has received considerable attention in computational linguistics, there is currently no dedicated survey that systematically reviews this specific area. To bridge this gap, this paper provides a comprehensive review of pun generation datasets and methods across different stages, including conventional approaches, deep learning techniques, and pre-trained language models. Additionally, we summarise both automated and human evaluation metrics used to assess the quality of pun generation. Finally, we discuss the research challenges and propose promising directions for future work.


Learn Globally, Speak Locally: Bridging the Gaps in Multilingual Reasoning

Large Language Models (LLMs) have achieved strong performance in domains like mathematics, factual QA, and code generation, yet their multilingual reasoning capabilities in these tasks remain underdeveloped. Especially for low-resource languages such as Swahili or Thai, LLMs can often misinterpret prompts or default to reasoning in English. This implicit bias toward high-resource languages undermines factual accuracy, interpretability, and trust. Current multilingual benchmarks focus only on final answers, overlooking whether models actually reason in the target language. To address this gap, we introduce GeoFact-X, a geography-based multilingual factual reasoning benchmark with annotated reasoning traces in five languages: English, Hindi, Japanese, Swahili, and Thai. We further propose BRIDGE, a novel training method that guides supervised fine-tuning and test-time reinforcement learning with a language-consistency reward to align reasoning with the input language. Finally, we develop an automatic evaluation protocol using LLM-as-a-judge to assess answer correctness and the quality and language consistency of reasoning traces, enabling nuanced and scalable analysis beyond surface-level metrics. Our results show that BRIDGE significantly enhances multilingual reasoning fidelity, demonstrating that reasoning-aware multilingual reinforcement learning is crucial for robust cross-lingual generalization. https://jd730.github.io/projects/GeoFact-X_BRIDGE


Transcribing Spanish Texts from the Past: Experiments with Transkribus, Tesseract and Granite

This article presents the experiments and results obtained by the GRESEL team in the IberLEF 2025 shared task PastReader: Transcribing Texts from the Past. Three types of experiments were conducted with the dual aim of participating in the task and enabling comparisons across different approaches. These included the use of a web-based OCR service, a traditional OCR engine, and a compact multimodal model. All experiments were run on consumer-grade hardware, which, despite lacking high-performance computing capacity, provided sufficient storage and stability. The results, while satisfactory, leave room for further improvement. Future work will focus on exploring new techniques and ideas using the Spanish-language dataset provided by the shared task, in collaboration with Biblioteca Nacional de Espa~na (BNE).


On the Semantics of Large Language Models

Large Language Models (LLMs) such as ChatGPT demonstrated the potential to replicate human language abilities through technology, ranging from text generation to engaging in conversations. However, it remains controversial to what extent these systems truly understand language. We examine this issue by narrowing the question down to the semantics of LLMs at the word and sentence level. By examining the inner workings of LLMs and their generated representation of language and by drawing on classical semantic theories by Frege and Russell, we get a more nuanced picture of the potential semantic capabilities of LLMs.

Curated by yukajii.com
Don't miss what's next. Subscribe to Daily MT Picks:
LinkedIn
Powered by Buttondown, the easiest way to start and grow your newsletter.