Richard G

Subscribe
Archives
September 9, 2025

GenAI Daily for Practitioners — 9 Sept 2025 (12 items)

GenAI Daily for Practitioners

Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Test-Time Scaling in Reasoning Models: No significant improvement in knowledge-intensive tasks. • COMPACT: Prunes models by 70-90% with minimal accuracy loss, suitable for resource-constrained devices. • Energy Landscapes: Enables reliable abstention in retrieval-augmented language models for healthcare, improving model robustness. • LinkAlign: Achieves 85% linking accuracy on real-world multi-database text-to-SQL tasks, outperforming state-of-the-art methods. • On the Same Wavelength?: Evaluates pragmatic reasoning in language models across broad concepts, finding limited success in capturing nuanced human reasoning. • EPT Benchmark: Develops a trustworthiness evaluation framework for Persian language models, highlighting the need for culturally sensitive AI development.

Research

  • Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet \ Test-time scaling increases inference-time computation by allowing models togenerate long reasoning chains, and has shown strong performance across manydomains. However, in this work, we show that this approach is not yet effectivefor know… \ Source • arXiv cs.CL • 18:28
  • COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens \ Making LLMs more efficient in memory, latency, and serving cost is crucialfor edge deployment, interactive applications, and sustainable inference atscale. Pruning is a key technique toward this goal. However, prior pruningmethods are limi… \ Source • arXiv cs.CL • 18:07
  • Energy Landscapes Enable Reliable Abstention in Retrieval-Augmented Large Language Models for Healthcare \ Reliable abstention is critical for retrieval-augmented generation (RAG)systems, particularly in safety-critical domains such as women's health, whereincorrect answers can lead to harm. We present an energy-based model (EBM) thatlearns a s… \ Source • arXiv cs.CL • 16:04
  • LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL \ Schema linking is a critical bottleneck in applying existing Text-to-SQLmodels to real-world, large-scale, multi-database environments. Through erroranalysis, we identify two major challenges in schema linking: (1) DatabaseRetrieval: accur… \ Source • arXiv cs.CL • 09:16
  • On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts \ Language use is shaped by pragmatics -- i.e., reasoning about communicativegoals and norms in context. As language models (LMs) are increasingly used asconversational agents, it becomes ever more important to understand theirpragmatic reas… \ Source • arXiv cs.CL • 19:59
  • EPT Benchmark: Evaluation of Persian Trustworthiness in Large Language Models \ Large Language Models (LLMs), trained on extensive datasets using advanceddeep learning architectures, have demonstrated remarkable performance across awide range of language tasks, becoming a cornerstone of modern AI technologies.However,… \ Source • arXiv cs.CL • 18:08
  • Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation \ Retrieval-Augmented Generation (RAG) has emerged as a widely adopted approachfor knowledge injection during large language model (LLM) inference in recentyears. However, due to their limited ability to exploit fine-grainedinter-document re… \ Source • arXiv cs.CL • 17:51
  • A Comparative Benchmark of Large Language Models for Labelling Wind Turbine Maintenance Logs \ Effective Operation and Maintenance (O&M) is critical to reducing theLevelised Cost of Energy (LCOE) from wind power, yet the unstructured,free-text nature of turbine maintenance logs presents a significant barrier toautomated analysis… \ Source • arXiv cs.CL • 17:48
  • Domain-Aware RAG: MoL-Enhanced RL for Efficient Training and Scalable Retrieval \ Retrieval-Augmented Generation (RAG) systems rely heavily on the retrievalstage, particularly the coarse-ranking process. Existing coarse-rankingoptimization approaches often struggle to balance domain-specific knowledgelearning with query… \ Source • arXiv cs.CL • 15:04
  • Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and Context-Aware KGQA \ Knowledge Graph Question Answering (KGQA) aims to interpret natural languagequeries and perform structured reasoning over knowledge graphs by leveragingtheir relational and semantic structures to retrieve accurate answers. RecentKGQA metho… \ Source • arXiv cs.CL • 14:44
  • MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs \ Multimodal large language models (MLLMs) excel at 2D visual understanding butremain limited in their ability to reason about 3D space. In this work, weleverage large-scale high-quality 3D scene data with open-set annotations tointroduce 1)… \ Source • arXiv cs.CL • 11:39
  • OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via Open-ended Interaction Simulation \ As the general capabilities of large language models (LLMs) improve and agentapplications become more widespread, the underlying deception risks urgentlyrequire systematic evaluation and effective oversight. Unlike existingevaluation which… \ Source • arXiv cs.CL • 11:05

Big Tech

No items today.

Regulation & Standards

No items today.

Enterprise Practice

No items today.

Open-Source Tooling

No items today.

— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G:
Powered by Buttondown, the easiest way to start and grow your newsletter.