GenAI Daily for Practitioners — 18 Nov 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullets for enterprise practitioners: • PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning: • + Achieves 94.1% accuracy on ChestX-ray14 dataset • + Improves interpretability by 12.5% compared to baselines • + No specific deployment or cost estimates mentioned • RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism: • + Boosts search quality by 15.6% and reasoning quality by 12.1% compared to single-query parallelism
Research
- PASS: Probabilistic Agentic Supernet Sampling for Interpretable and Adaptive Chest X-Ray Reasoning \ Existing tool-augmented agentic systems are limited in the real world by (i) black-box reasoning steps that undermine trust of decision-making and pose safety risks, (ii) poor multimodal integration, which is inherently critical for health… \ Source • arXiv cs.LG • 17:36
- RAG-R1: Incentivizing the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism \ Large Language Models (LLMs), despite their remarkable capabilities, are prone to generating hallucinated or outdated content due to their static internal knowledge. While Retrieval-Augmented Generation (RAG) integrated with Reinforcement … \ Source • arXiv cs.CL • 13:23
- TCM-5CEval: Extended Deep Evaluation Benchmark for LLM's Comprehensive Clinical Research Competence in Traditional Chinese Medicine \ Large language models (LLMs) have demonstrated exceptional capabilities in general domains, yet their application in highly specialized and culturally-rich fields like Traditional Chinese Medicine (TCM) requires rigorous and nuanced evalua… \ Source • arXiv cs.CL • 10:15
- Evaluating the Ability of Large Language Models to Identify Adherence to CONSORT Reporting Guidelines in Randomized Controlled Trials: A Methodological Evaluation Study \ The Consolidated Standards of Reporting Trials statement is the global benchmark for transparent and high-quality reporting of randomized controlled trials. Manual verification of CONSORT adherence is a laborious, time-intensive process th… \ Source • arXiv cs.CL • 09:05
- Ken Utilization Layer: Hebbian Replay Within a Student's Ken for Adaptive Exercise Recommendation \ Adaptive exercise recommendation (ER) aims to choose the next activity that matches a learner's evolving Zone of Proximal Development (ZPD). We present KUL-Rec, a biologically inspired ER system that couples a fast Hebbian memory with slow… \ Source • arXiv cs.LG • 16:10
- Likelihood-guided Regularization in Attention Based Models \ The transformer architecture has demonstrated strong performance in classification tasks involving structured and high-dimensional data. However, its success often hinges on large- scale training data and careful regularization to prevent … \ Source • arXiv stat.ML • 11:38
- Generalist Foundation Models Are Not Clinical Enough for Hospital Operations \ Hospitals and healthcare systems rely on operational decisions that determine patient flow, cost, and quality of care. Despite strong performance on medical knowledge and conversational benchmarks, foundation models trained on general text… \ Source • arXiv cs.CL • 19:52
- Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? \ Large Language Models (LLMs) are reshaping almost all industries, including software engineering. In recent years, a number of LLM agents have been proposed to solve real-world software problems. Such software agents are typically equipped… \ Source • arXiv cs.CL • 18:58
- Bilevel MCTS for Amortized O(1) Node Selection in Classical Planning \ We study an efficient implementation of Multi-Armed Bandit (MAB)-based Monte-Carlo Tree Search (MCTS) for classical planning. One weakness of MCTS is that it spends a significant time deciding which node to expand next. While selecting a n… \ Source • arXiv cs.CL • 18:06
- Exploring Multi-Table Retrieval Through Iterative Search \ Open-domain question answering over datalakes requires retrieving and composing information from multiple tables, a challenging subtask that demands semantic relevance and structural coherence (e.g., joinability). While exact optimization … \ Source • arXiv cs.CL • 15:31
- Attention Grounded Enhancement for Visual Document Retrieval \ Visual document retrieval requires understanding heterogeneous and multi-modal content to satisfy information needs. Recent advances use screenshot-based document encoding with fine-grained late interaction, significantly improving retriev… \ Source • arXiv cs.CL • 15:28
- Can Large Language Models Function as Qualified Pediatricians? A Systematic Evaluation in Real-World Clinical Contexts \ With the rapid rise of large language models (LLMs) in medicine, a key question is whether they can function as competent pediatricians in real-world clinical settings. We developed PEDIASBench, a systematic evaluation framework centered o… \ Source • arXiv cs.CL • 14:54
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.
Don't miss what's next. Subscribe to Richard G: