GenAI Daily for Practitioners — 5 Dec 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullet points for enterprise practitioners: • HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks • + Identifies a 15% performance gap between human annotators and state-of-the-art models in text classification tasks. • + Highlights the need for more accurate evaluation metrics and more diverse training datasets. • Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines • + Develops a system that integrates large language models with clinical evidence to improve querying UK NICE guidelines. • + Achieves 80% accuracy in retrieving relevant clinical guidelines.
Research
- HUME: Measuring the Human-Model Performance Gap in Text Embedding Tasks \ Comparing human and model performance offers a valuable perspective for understanding the strengths and limitations of embedding models, highlighting where they succeed and where they fail to capture meaning and nuance. However, such compa… \ Source • arXiv cs.CL • 17:31
- Grounding Large Language Models in Clinical Evidence: A Retrieval-Augmented Generation System for Querying UK NICE Clinical Guidelines \ This paper presents the development and evaluation of a Retrieval-Augmented Generation (RAG) system for querying the United Kingdom's National Institute for Health and Care Excellence (NICE) clinical guidelines using Large Language Models … \ Source • arXiv cs.CL • 13:52
- Arbitrage: Efficient Reasoning via Advantage-Aware Speculation \ Modern Large Language Models achieve impressive reasoning capabilities with long Chain of Thoughts, but they incur substantial computational cost during inference, and this motivates techniques to improve the performance-cost ratio. Among … \ Source • arXiv cs.CL • 18:50
- Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models \ We present Athena-PRM, a multimodal process reward model (PRM) designed to evaluate the reward score for each step in solving complex reasoning problems. Developing high-performance PRMs typically demands significant time and financial inv… \ Source • arXiv cs.CL • 19:28
- Factuality and Transparency Are All RAG Needs! Self-Explaining Contrastive Evidence Re-ranking \ This extended abstract introduces Self-Explaining Contrastive Evidence Re-Ranking (CER), a novel method that restructures retrieval around factual evidence by fine-tuning embeddings with contrastive learning and generating token-level attr… \ Source • arXiv cs.CL • 18:24
- MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications \ Large Language Models (LLMs) excel at generating coherent text within a single prompt but fall short in sustaining relevance, personalization, and continuity across extended interactions. Human communication, however, relies on multiple fo… \ Source • arXiv cs.CL • 14:06
- MemLoRA: Distilling Expert Adapters for On-Device Memory Systems \ Memory-augmented Large Language Models (LLMs) have demonstrated remarkable consistency during prolonged dialogues by storing relevant memories and incorporating them as context. Such memory-based personalization is also key in on-device se… \ Source • arXiv cs.CL • 13:56
- OsmT: Bridging OpenStreetMap Queries and Natural Language with Open-source Tag-aware Language Models \ Bridging natural language and structured query languages is a long-standing challenge in the database community. While recent advances in language models have shown promise in this direction, existing solutions often rely on large-scale cl… \ Source • arXiv cs.CL • 13:24
- Ground-Truth Subgraphs for Better Training and Evaluation of Knowledge Graph Augmented LLMs \ Retrieval of information from graph-structured knowledge bases represents a promising direction for improving the factuality of LLMs. While various solutions have been proposed, a comparison of methods is difficult due to the lack of chall… \ Source • arXiv cs.CL • 09:34
- ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning \ We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by hand, \adapt{} maintains a continuous distribution… \ Source • arXiv cs.CL • 09:17
- David vs. Goliath: Can Small Models Win Big with Agentic AI in Hardware Design? \ Large Language Model(LLM) inference demands massive compute and energy, making domain-specific tasks expensive and unsustainable. As foundation models keep scaling, we ask: Is bigger always better for hardware design? Our work tests this b… \ Source • arXiv cs.LG • 19:37
- Environment-Aware Channel Inference via Cross-Modal Flow: From Multimodal Sensing to Wireless Channels \ Accurate channel state information (CSI) underpins reliable and efficient wireless communication. However, acquiring CSI via pilot estimation incurs substantial overhead, especially in massive multiple-input multiple-output (MIMO) systems … \ Source • arXiv cs.LG • 17:35
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.