GenAI Daily for Practitioners — 9 Sept 2025 (12 items)

No items today.

                September 9, 2025

            GenAI Daily for Practitioners — 9 Sept 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Test-Time Scaling in Reasoning Models: No significant improvement in knowledge-intensive tasks.
• COMPACT: Prunes models by 70-90% with minimal accuracy loss, suitable for resource-constrained devices.
• Energy Landscapes: Enables reliable abstention in retrieval-augmented language models for healthcare, improving model robustness.
• LinkAlign: Achieves 85% linking accuracy on real-world multi-database text-to-SQL tasks, outperforming state-of-the-art methods.
• On the Same Wavelength?: Evaluates pragmatic reasoning in language models across broad concepts, finding limited success in capturing nuanced human reasoning.
• EPT Benchmark: Develops a trustworthiness evaluation framework for Persian language models, highlighting the need for culturally sensitive AI development.
Research

Test-Time Scaling in Reasoning Models Is Not Effective for  Knowledge-Intensive Tasks Yet  \
  Test-time scaling increases inference-time computation by allowing models togenerate long reasoning chains, and has shown strong performance across manydomains. However, in this work, we show that this approach is not yet effectivefor know…  \
  Source • arXiv cs.CL • 18:28
COMPACT: Common-token Optimized Model Pruning Across Channels and Tokens  \
  Making LLMs more efficient in memory, latency, and serving cost is crucialfor edge deployment, interactive applications, and sustainable inference atscale. Pruning is a key technique toward this goal. However, prior pruningmethods are limi…  \
  Source • arXiv cs.CL • 18:07
Energy Landscapes Enable Reliable Abstention in Retrieval-Augmented  Large Language Models for Healthcare  \
  Reliable abstention is critical for retrieval-augmented generation (RAG)systems, particularly in safety-critical domains such as women's health, whereincorrect answers can lead to harm. We present an energy-based model (EBM) thatlearns a s…  \
  Source • arXiv cs.CL • 16:04
LinkAlign: Scalable Schema Linking for Real-World Large-Scale  Multi-Database Text-to-SQL  \
  Schema linking is a critical bottleneck in applying existing Text-to-SQLmodels to real-world, large-scale, multi-database environments. Through erroranalysis, we identify two major challenges in schema linking: (1) DatabaseRetrieval: accur…  \
  Source • arXiv cs.CL • 09:16
On the Same Wavelength? Evaluating Pragmatic Reasoning in Language  Models across Broad Concepts  \
  Language use is shaped by pragmatics -- i.e., reasoning about communicativegoals and norms in context. As language models (LMs) are increasingly used asconversational agents, it becomes ever more important to understand theirpragmatic reas…  \
  Source • arXiv cs.CL • 19:59
EPT Benchmark: Evaluation of Persian Trustworthiness in Large Language  Models  \
  Large Language Models (LLMs), trained on extensive datasets using advanceddeep learning architectures, have demonstrated remarkable performance across awide range of language tasks, becoming a cornerstone of modern AI technologies.However,…  \
  Source • arXiv cs.CL • 18:08
Efficient Dynamic Clustering-Based Document Compression for  Retrieval-Augmented-Generation  \
  Retrieval-Augmented Generation (RAG) has emerged as a widely adopted approachfor knowledge injection during large language model (LLM) inference in recentyears. However, due to their limited ability to exploit fine-grainedinter-document re…  \
  Source • arXiv cs.CL • 17:51
A Comparative Benchmark of Large Language Models for Labelling Wind  Turbine Maintenance Logs  \
  Effective Operation and Maintenance (O&M) is critical to reducing theLevelised Cost of Energy (LCOE) from wind power, yet the unstructured,free-text nature of turbine maintenance logs presents a significant barrier toautomated analysis…  \
  Source • arXiv cs.CL • 17:48
Domain-Aware RAG: MoL-Enhanced RL for Efficient Training and Scalable  Retrieval  \
  Retrieval-Augmented Generation (RAG) systems rely heavily on the retrievalstage, particularly the coarse-ranking process. Existing coarse-rankingoptimization approaches often struggle to balance domain-specific knowledgelearning with query…  \
  Source • arXiv cs.CL • 15:04
Dynamically Adaptive Reasoning via LLM-Guided MCTS for Efficient and  Context-Aware KGQA  \
  Knowledge Graph Question Answering (KGQA) aims to interpret natural languagequeries and perform structured reasoning over knowledge graphs by leveragingtheir relational and semantic structures to retrieve accurate answers. RecentKGQA metho…  \
  Source • arXiv cs.CL • 14:44
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs  \
  Multimodal large language models (MLLMs) excel at 2D visual understanding butremain limited in their ability to reason about 3D space. In this work, weleverage large-scale high-quality 3D scene data with open-set annotations tointroduce 1)…  \
  Source • arXiv cs.CL • 11:39
OpenDeception: Benchmarking and Investigating AI Deceptive Behaviors via  Open-ended Interaction Simulation  \
  As the general capabilities of large language models (LLMs) improve and agentapplications become more widespread, the underlying deception risks urgentlyrequire systematic evaluation and effective oversight. Unlike existingevaluation which…  \
  Source • arXiv cs.CL • 11:05

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: