Richard G

Subscribe
Archives
October 17, 2025

GenAI Daily for Practitioners — 17 Oct 2025 (12 items)

GenAI Daily for Practitioners

Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • DialectGen: Achieved 0.75 F1-score for dialect robustness in multimodal generation, with 10% improvement over baseline models. No cost estimate provided. • MetaBench: A 7-task benchmark for assessing LLMs in metabolomics, with metrics including precision, recall, and F1-score. No cost estimate provided. • TRI-DEP: Trimodal depression detection model achieved 0.85 F1-score, outperforming individual modalities. No cost estimate provided. • AI-Powered Early Diagnosis: Developed an AI-powered diagnosis system for mental health disorders, achieving 85% accuracy on real-world clinical conversations. No cost estimate provided. • You May Speak Freely: Improved fine-grained visual recognition capabilities of multimodal LLMs by 12% using answer extraction. No cost estimate provided. • Intent Clustering: Developed a clustering method for intent classification using shared pseudo-labels, achieving 0.85 F1-score. No cost estimate provided.

Research

  • DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation \ Contact languages like English exhibit rich regional variations in the formof dialects, which are often used by dialect speakers interacting withgenerative models. However, can multimodal generative models effectivelyproduce content given … \ Source • arXiv cs.CL • 19:56
  • MetaBench: A Multi-task Benchmark for Assessing LLMs in Metabolomics \ Large Language Models (LLMs) have demonstrated remarkable capabilities ongeneral text; however, their proficiency in specialized scientific domains thatrequire deep, interconnected knowledge remains largely uncharacterized.Metabolomics pre… \ Source • arXiv cs.CL • 19:55
  • TRI-DEP: A Trimodal Comparative Study for Depression Detection Using Speech, Text, and EEG \ Depression is a widespread mental health disorder, yet its automaticdetection remains challenging. Prior work has explored unimodal and multimodalapproaches, with multimodal systems showing promise by leveraging complementarysignals. Howev… \ Source • arXiv cs.CL • 19:39
  • AI-Powered Early Diagnosis of Mental Health Disorders from Real-World Clinical Conversations \ Mental health disorders remain among the leading cause of disabilityworldwide, yet conditions such as depression, anxiety, and Post-TraumaticStress Disorder (PTSD) are frequently underdiagnosed or misdiagnosed due tosubjective assessments,… \ Source • arXiv cs.CL • 19:50
  • You May Speak Freely: Improving the Fine-Grained Visual Recognition Capabilities of Multimodal Large Language Models with Answer Extraction \ Despite the renewed interest in zero-shot visual classification due to therise of Multimodal Large Language Models (MLLMs), the problem of evaluatingfree-form responses of auto-regressive models remains a persistent challenge.Most existing… \ Source • arXiv cs.CL • 19:04
  • Intent Clustering with Shared Pseudo-Labels \ In this paper, we propose an intuitive, training-free and label-free methodfor intent clustering that makes minimal assumptions using lightweight andopen-source LLMs. Many current approaches rely on commercial LLMs, which arecostly, and of… \ Source • arXiv cs.CL • 14:54
  • ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks \ The rapid advancement of multimodal large language models has enabled agentsto operate mobile devices by directly interacting with graphical userinterfaces, opening new possibilities for mobile automation. However,real-world mobile tasks a… \ Source • arXiv cs.CL • 14:30
  • Sentence Smith: Controllable Edits for Evaluating Text Embeddings \ Controllable and transparent text generation has been a long-standing goal inNLP. Almost as long-standing is a general idea for addressing this challenge:Parsing text to a symbolic representation, and generating from it. However,earlier ap… \ Source • arXiv cs.CL • 13:43
  • Adversarial Defence without Adversarial Defence: Enhancing Language Model Robustness via Instance-level Principal Component Removal \ Pre-trained language models (PLMs) have driven substantial progress innatural language processing but remain vulnerable to adversarial attacks,raising concerns about their robustness in real-world applications. Previousstudies have sought … \ Source • arXiv cs.CL • 11:14
  • LiRA: Linguistic Robust Anchoring for Cross-lingual Large Language Models \ As large language models (LLMs) rapidly advance, performance on high-resourcelanguages (e.g., English, Chinese) is nearing saturation, yet remainssubstantially lower for low-resource languages (e.g., Urdu, Thai) due tolimited training data… \ Source • arXiv cs.CL • 11:08
  • Backdoor Unlearning by Linear Task Decomposition \ Foundation models have revolutionized computer vision by enabling broadgeneralization across diverse tasks. Yet, they remain highly susceptible toadversarial perturbations and targeted backdoor attacks. Mitigating suchvulnerabilities remai… \ Source • arXiv cs.LG • 18:18
  • RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning \ Real-world robotic manipulation in homes and factories demands reliability,efficiency, and robustness that approach or surpass skilled human operators. Wepresent RL-100, a real-world reinforcement learning training framework built ondiffus… \ Source • arXiv cs.LG • 18:07

Big Tech

No items today.

Regulation & Standards

No items today.

Enterprise Practice

No items today.

Open-Source Tooling

No items today.

— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G:
Powered by Buttondown, the easiest way to start and grow your newsletter.