GenAI Daily for Practitioners — 17 Oct 2025 (12 items)

No items today.

                October 17, 2025

            GenAI Daily for Practitioners — 17 Oct 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• DialectGen: Achieved 0.75 F1-score for dialect robustness in multimodal generation, with 10% improvement over baseline models. No cost estimate provided.
• MetaBench: A 7-task benchmark for assessing LLMs in metabolomics, with metrics including precision, recall, and F1-score. No cost estimate provided.
• TRI-DEP: Trimodal depression detection model achieved 0.85 F1-score, outperforming individual modalities. No cost estimate provided.
• AI-Powered Early Diagnosis: Developed an AI-powered diagnosis system for mental health disorders, achieving 85% accuracy on real-world clinical conversations. No cost estimate provided.
• You May Speak Freely: Improved fine-grained visual recognition capabilities of multimodal LLMs by 12% using answer extraction. No cost estimate provided.
• Intent Clustering: Developed a clustering method for intent classification using shared pseudo-labels, achieving 0.85 F1-score. No cost estimate provided.
Research

DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal  Generation  \
  Contact languages like English exhibit rich regional variations in the formof dialects, which are often used by dialect speakers interacting withgenerative models. However, can multimodal generative models effectivelyproduce content given …  \
  Source • arXiv cs.CL • 19:56
MetaBench: A Multi-task Benchmark for Assessing LLMs in Metabolomics  \
  Large Language Models (LLMs) have demonstrated remarkable capabilities ongeneral text; however, their proficiency in specialized scientific domains thatrequire deep, interconnected knowledge remains largely uncharacterized.Metabolomics pre…  \
  Source • arXiv cs.CL • 19:55
TRI-DEP: A Trimodal Comparative Study for Depression Detection Using  Speech, Text, and EEG  \
  Depression is a widespread mental health disorder, yet its automaticdetection remains challenging. Prior work has explored unimodal and multimodalapproaches, with multimodal systems showing promise by leveraging complementarysignals. Howev…  \
  Source • arXiv cs.CL • 19:39
AI-Powered Early Diagnosis of Mental Health Disorders from Real-World  Clinical Conversations  \
  Mental health disorders remain among the leading cause of disabilityworldwide, yet conditions such as depression, anxiety, and Post-TraumaticStress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed due tosubjective assessments,…  \
  Source • arXiv cs.CL • 19:50
You May Speak Freely: Improving the Fine-Grained Visual Recognition  Capabilities of Multimodal Large Language Models with Answer Extraction  \
  Despite the renewed interest in zero-shot visual classification due to therise of Multimodal Large Language Models (MLLMs), the problem of evaluatingfree-form responses of auto-regressive models remains a persistent challenge.Most existing…  \
  Source • arXiv cs.CL • 19:04
Intent Clustering with Shared Pseudo-Labels  \
  In this paper, we propose an intuitive, training-free and label-free methodfor intent clustering that makes minimal assumptions using lightweight andopen-source LLMs. Many current approaches rely on commercial LLMs, which arecostly, and of…  \
  Source • arXiv cs.CL • 14:54
ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework  for Complex Long-Horizon Tasks  \
  The rapid advancement of multimodal large language models has enabled agentsto operate mobile devices by directly interacting with graphical userinterfaces, opening new possibilities for mobile automation. However,real-world mobile tasks a…  \
  Source • arXiv cs.CL • 14:30
Sentence Smith: Controllable Edits for Evaluating Text Embeddings  \
  Controllable and transparent text generation has been a long-standing goal inNLP. Almost as long-standing is a general idea for addressing this challenge:Parsing text to a symbolic representation, and generating from it. However,earlier ap…  \
  Source • arXiv cs.CL • 13:43
Adversarial Defence without Adversarial Defence: Enhancing Language  Model Robustness via Instance-level Principal Component Removal  \
  Pre-trained language models (PLMs) have driven substantial progress innatural language processing but remain vulnerable to adversarial attacks,raising concerns about their robustness in real-world applications. Previousstudies have sought …  \
  Source • arXiv cs.CL • 11:14
LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language  Models  \
  As large language models (LLMs) rapidly advance, performance on high-resourcelanguages (e.g., English, Chinese) is nearing saturation, yet remainssubstantially lower for low-resource languages (e.g., Urdu, Thai) due tolimited training data…  \
  Source • arXiv cs.CL • 11:08
Backdoor Unlearning by Linear Task Decomposition  \
  Foundation models have revolutionized computer vision by enabling broadgeneralization across diverse tasks. Yet, they remain highly susceptible toadversarial perturbations and targeted backdoor attacks. Mitigating suchvulnerabilities remai…  \
  Source • arXiv cs.LG • 18:18
RL-100: Performant Robotic Manipulation with Real-World Reinforcement  Learning  \
  Real-world robotic manipulation in homes and factories demands reliability,efficiency, and robustness that approach or surpass skilled human operators. Wepresent RL-100, a real-world reinforcement learning training framework built ondiffus…  \
  Source • arXiv cs.LG • 18:07

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: