Machine Translation Digest for Oct 12 2025
Here is today's selection of cs.CL papers exploring innovative approaches in machine translation and language model optimization. Common themes include advancements in multilingual neural machine translation and the enhancement of large language models for specialized tasks such as semantic knowledge editing and structured data extraction.
Happiness is Sharing a Vocabulary: A Study of Transliteration Methods
Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script, overlapping token vocabularies, and shared phonology contribute to performance of multilingual models. To this end, we conduct controlled experiments using three kinds of transliteration (romanization, phonemic transcription, and substitution ciphers) as well as orthography. We evaluate each model on two downstream tasks -- named entity recognition (NER) and natural language inference (NLI) -- and find that romanization significantly outperforms other input types in 7 out of 8 evaluation settings, largely consistent with our hypothesis that it is the most effective approach. We further analyze how each factor contributed to the success, and suggest that having longer (subword) tokens shared with pre-trained languages leads to better utilization of the model.
Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation
This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.
STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models
Large Language Models store extensive factual knowledge acquired during large-scale pre-training. However, this knowledge is inherently static, reflecting only the state of the world at the time of training. Knowledge editing has emerged as a promising solution for updating outdated or incorrect facts without full retraining. However, most existing locate-and-edit methods primarily focus on token-level likelihood optimization without addressing semantic coherence. Our analysis reveals that such edited knowledge is often encoded as isolated residual streams in the model's latent space, distinct from pre-existing knowledge and bypassing natural reasoning process. To address this, we propose \textsc{Steam}, a semantic-level knowledge editing framework that enhances integration of updated knowledge into the model's knowledge structure. \textsc{Steam} first identifies target representations as semantic anchors for the updated factual association, then guides the internal representation of the edited fact towards these anchors through an alignment loss during optimization. Experimental results demonstrate that \textsc{Steam} improves model's ability to reason with edited knowledge and enhances semantic coherence, underscoring the importance of latent-space alignment for reliable and coherent knowledge editing. The code is available at https://github.com/GY-Jeong/STEAM.
Sarcasm Detection Using Deep Convolutional Neural Networks: A Modular Deep Learning Framework
Sarcasm is a nuanced and often misinterpreted form of communication, especially in text, where tone and body language are absent. This paper proposes a modular deep learning framework for sarcasm detection, leveraging Deep Convolutional Neural Networks (DCNNs) and contextual models such as BERT to analyze linguistic, emotional, and contextual cues. The system integrates sentiment analysis, contextual embeddings, linguistic feature extraction, and emotion detection through a multi-layer architecture. While the model is in the conceptual stage, it demonstrates feasibility for real-world applications such as chatbots and social media analysis.
Assessing Large Language Models for Structured Medical Order Extraction
Medical order extraction is essential for structuring actionable clinical information, supporting decision-making, and enabling downstream applications such as documentation and workflow automation. Orders may be embedded in diverse sources, including electronic health records, discharge summaries, and multi-turn doctor-patient dialogues, and can span categories such as medications, laboratory tests, imaging studies, and follow-up actions. The MEDIQA-OE 2025 shared task focuses on extracting structured medical orders from extended conversational transcripts, requiring the identification of order type, description, reason, and provenance. We present the MasonNLP submission, which ranked 5th among 17 participating teams with 105 total submissions. Our approach uses a general-purpose, instruction-tuned LLaMA-4 17B model without domain-specific fine-tuning, guided by a single in-context example. This few-shot configuration achieved an average F1 score of 37.76, with notable improvements in reason and provenance accuracy. These results demonstrate that large, non-domain-specific LLMs, when paired with effective prompt engineering, can serve as strong, scalable baselines for specialized clinical NLP tasks.
| Curated by yukajii.com |