GenAI Daily for Practitioners — 24 Sept 2025 (12 items)

No items today.

                September 24, 2025

            GenAI Daily for Practitioners — 24 Sept 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models:
• + Achieves 62.6% accuracy on the WSC2019 dataset, outperforming human performance by 2.1%.
• + Uses a novel reasoning-based approach to generate questions.
• + No cost estimates provided.
• Soft Tokens, Hard Truths:
• + Introduces a novel soft token-based attention mechanism for language models.
Research

Benchmarking Critical Questions Generation: A Challenging Reasoning Task  for Large Language Models  \
  The task of Critical Questions Generation (CQs-Gen) aims to foster criticalthinking by enabling systems to generate questions that expose underlyingassumptions and challenge the validity of argumentative reasoning structures.Despite growin…  \
  Source • arXiv cs.CL • 19:07
Soft Tokens, Hard Truths  \
  The use of continuous instead of discrete tokens during the Chain-of-Thought(CoT) phase of reasoning LLMs has garnered attention recently, based on theintuition that a continuous mixture of discrete tokens could simulate asuperposition of …  \
  Source • arXiv cs.CL • 17:43
Recovering Wasserstein Distance Matrices from Few Measurements  \
  This paper proposes two algorithms for estimating square Wasserstein distancematrices from a small number of entries. These matrices are used to computemanifold learning embeddings like multidimensional scaling (MDS) or Isomap, butcontrary…  \
  Source • arXiv cs.LG • 19:11
Neighbor Embeddings Using Unbalanced Optimal Transport Metrics  \
  This paper proposes the use of the Hellinger--Kantorovich metric fromunbalanced optimal transport (UOT) in a dimensionality reduction and learning(supervised and unsupervised) pipeline. The performance of UOT is compared tothat of regular …  \
  Source • arXiv stat.ML • 18:49
Reward-Shifted Speculative Sampling Is An Efficient Test-Time  Weak-to-Strong Aligner  \
  Aligning large language models (LLMs) with human preferences has become acritical step in their development. Recent research has increasingly focused ontest-time alignment, where additional compute is allocated during inference toenhance L…  \
  Source • arXiv cs.CL • 19:25
Systematic Comparative Analysis of Large Pretrained Language Models on  Contextualized Medication Event Extraction  \
  Attention-based models have become the leading approach in modeling medicallanguage for Natural Language Processing (NLP) in clinical notes. These modelsoutperform traditional techniques by effectively capturing contextual rep-resentations…  \
  Source • arXiv cs.CL • 18:48
A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of  Large Language Models  \
  Large Language Models (LLMs) have transformed natural language processing,yet their internal mechanisms remain largely opaque. Recently, mechanisticinterpretability has attracted significant attention from the researchcommunity as a means …  \
  Source • arXiv cs.CL • 18:48
Large Vision-Language Model Alignment and Misalignment: A Survey Through  the Lens of Explainability  \
  Large Vision-Language Models (LVLMs) have demonstrated remarkablecapabilities in processing both visual and textual information. However, thecritical challenge of alignment between visual and textual representations isnot fully understood.…  \
  Source • arXiv cs.CL • 18:40
Steering Multimodal Large Language Models Decoding for Context-Aware  Safety  \
  Multimodal Large Language Models (MLLMs) are increasingly deployed inreal-world applications, yet their ability to make context-aware safetydecisions remains limited. Existing methods often fail to balanceoversensitivity (unjustified refus…  \
  Source • arXiv cs.CL • 18:32
Residual Off-Policy RL for Finetuning Behavior Cloning Policies  \
  Recent advances in behavior cloning (BC) have enabled impressive visuomotorcontrol policies. However, these approaches are limited by the quality of humandemonstrations, the manual effort required for data collection, and thediminishing re…  \
  Source • arXiv cs.LG • 19:59
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier  LLMs  \
  Large language model (LLM) developers aim for their models to be honest,helpful, and harmless. However, when faced with malicious requests, models aretrained to refuse, sacrificing helpfulness. We show that frontier LLMs candevelop a prefe…  \
  Source • arXiv cs.LG • 19:34
Single-stream Policy Optimization  \
  We revisit policy-gradient optimization for Large Language Models (LLMs) froma single-stream perspective. Prevailing group-based methods like GRPO reducevariance with on-the-fly baselines but suffer from critical flaws: frequentdegenerate …  \
  Source • arXiv stat.ML • 16:19

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: