GenAI Daily for Practitioners — 5 Dec 2025 (12 items)

No items today.

        December 5, 2025

GenAI Daily for Practitioners — 5 Dec 2025 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise bullet points for enterprise practitioners:
• HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks
• + Identifies a 15% performance gap between human annotators and state-of-the-art models in text classification tasks.
• + Highlights the need for more accurate evaluation metrics and more diverse training datasets.
• Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines
• + Develops a system that integrates large language models with clinical evidence to improve querying UK NICE guidelines.
• + Achieves 80% accuracy in retrieving relevant clinical guidelines.
Research

HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks  \
  Comparing human and model performance offers a valuable perspective for understanding the strengths and limitations of embedding models, highlighting where they succeed and where they fail to capture meaning and nuance. However, such compa…  \
  Source • arXiv cs.CL • 17:31
Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines  \
  This paper presents the development and evaluation of a Retrieval-Augmented Generation (RAG) system for querying the United Kingdom's National Institute for Health and Care Excellence (NICE) clinical guidelines using Large Language Models …  \
  Source • arXiv cs.CL • 13:52
Arbitrage: Efficient Reasoning via Advantage-Aware Speculation  \
  Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniques to improve the performance-cost ratio. Among …  \
  Source • arXiv cs.CL • 18:50
Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models  \
  We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial inv…  \
  Source • arXiv cs.CL • 19:28
Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking  \
  This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER), a novel method that restructures retrieval around factual evidence by fine-tuning embeddings with contrastive learning and generating token-level attr…  \
  Source • arXiv cs.CL • 18:24
MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications  \
  Large Language Models (LLMs) excel at generating coherent text within a single prompt but fall short in sustaining relevance, personalization, and continuity across extended interactions. Human communication, however, relies on multiple fo…  \
  Source • arXiv cs.CL • 14:06
MemLoRA: Distilling Expert Adapters for On-Device Memory Systems  \
  Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device se…  \
  Source • arXiv cs.CL • 13:56
OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models  \
  Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction, existing solutions often rely on large-scale cl…  \
  Source • arXiv cs.CL • 13:24
Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs  \
  Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of chall…  \
  Source • arXiv cs.CL • 09:34
ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning  \
  We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by hand, \adapt{} maintains a continuous distribution…  \
  Source • arXiv cs.CL • 09:17
David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design?  \
  Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always better for hardware design? Our work tests this b…  \
  Source • arXiv cs.LG • 19:37
Environment-Aware Channel Inference via Cross-Modal Flow: From Multimodal Sensing to Wireless Channels  \
  Accurate channel state information (CSI) underpins reliable and efficient wireless communication. However, acquiring CSI via pilot estimation incurs substantial overhead, especially in massive multiple-input multiple-output (MIMO) systems …  \
  Source • arXiv cs.LG • 17:35

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                                Don't miss what's next. Subscribe to Richard G:

            Email address (required)