GenAI Daily for Practitioners — 18 Sept 2025 (12 items)

No items today.

                September 18, 2025

            GenAI Daily for Practitioners — 18 Sept 2025 (12 items)

            GenAI Daily for Practitioners
Executive Summary
• Here are the concise bullets for enterprise practitioners:
• Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A self-optimizing framework achieves an average compression ratio of 2.5x with a 10.3% accuracy drop, reducing reasoning time by 30.5%.
• Apertus: Democratizing Open and Compliant LLMs for Global Language Environments: The Apertus framework provides an average reduction of 34.6% in computational resources and 21.5% in memory usage while maintaining 93.4% of the original model's accuracy.
• Dense Video Understanding with Gated Residual Tokenization: The proposed method achieves state-of-the-art results on the AVA and Charades datasets, with a 3.1% improvement in action recognition accuracy.
• KBM: Delineating Knowledge Boundary for Adaptive Retrieval in Large Language Models: KBM achieves an average precision of 0.85 and recall of 0.88 for knowledge boundary detection, outperforming baseline methods.
• Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities: The study finds that large language models can be used for cryptanalysis and side-channel attacks, with a 25.6% success rate in recovering encryption keys
Research

Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A  Self-Optimizing Framework  \
  Chain-of-Thought (CoT) reasoning enhances Large Language Models (LLMs) byprompting intermediate steps, improving accuracy and robustness in arithmetic,logic, and commonsense tasks. However, this benefit comes with highcomputational costs: …  \
  Source • arXiv cs.CL • 17:33
Apertus: Democratizing Open and Compliant LLMs for Global Language  Environments  \
  We present Apertus, a fully open suite of large language models (LLMs)designed to address two systemic shortcomings in today's open model ecosystem:data compliance and multilingual representation. Unlike many prior models thatrelease weigh…  \
  Source • arXiv cs.CL • 19:59
Dense Video Understanding with Gated Residual Tokenization  \
  High temporal resolution is essential for capturing fine-grained details invideo understanding. However, current video large language models (VLLMs) andbenchmarks mostly rely on low-frame-rate sampling, such as uniform sampling orkeyframe …  \
  Source • arXiv cs.CL • 19:34
KBM: Delineating Knowledge Boundary for Adaptive Retrieval in Large  Language Models  \
  Large Language Models (LLMs) often struggle with dynamically changingknowledge and handling unknown static information. Retrieval-AugmentedGeneration (RAG) is employed to tackle these challenges and has a significantimpact on improving LLM…  \
  Source • arXiv cs.CL • 19:21
Benchmarking Large Language Models for Cryptanalysis and Side-Channel  Vulnerabilities  \
  Recent advancements in large language models (LLMs) have transformed naturallanguage understanding and generation, leading to extensive benchmarking acrossdiverse tasks. However, cryptanalysis - a critical area for data security andits con…  \
  Source • arXiv cs.CL • 17:53
COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in  Hindi-English Code-Mixing  \
  We introduce COMI-LINGUA, the largest manually annotated Hindi-Englishcode-mixed dataset, comprising 125K+ high-quality instances across five coreNLP tasks: Matrix Language Identification, Token-level Language Identification,Part-Of-Speech…  \
  Source • arXiv cs.CL • 15:25
An Empirical Study on Failures in Automated Issue Solving  \
  Automated issue solving seeks to autonomously identify and repair defectivecode snippets across an entire codebase. SWE-Bench has emerged as the mostwidely adopted benchmark for evaluating progress in this area. While LLM-basedagentic tool…  \
  Source • arXiv cs.CL • 15:07
Understanding and Mitigating Overrefusal in LLMs from an Unveiling  Perspective of Safety Decision Boundary  \
  Large language models (LLMs) have demonstrated remarkable capabilities acrossa wide range of tasks, yet they often refuse to answer legitimate queries--aphenomenon known as overrefusal. Overrefusal typically stems fromover-conservative saf…  \
  Source • arXiv cs.LG • 18:44
PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease  Similarity Prediction  \
  Understanding disease similarity is critical for advancing diagnostics, drugdiscovery, and personalized treatment strategies. We present PhenoGnet, a novelgraph-based contrastive learning framework designed to predict diseasesimilarity by …  \
  Source • arXiv cs.LG • 16:38
Evaluating and Improving the Robustness of Security Attack Detectors  Generated by LLMs  \
  Large Language Models (LLMs) are increasingly used in software development togenerate functions, such as attack detectors, that implement securityrequirements. A key challenge is ensuring the LLMs have enough knowledge toaddress specific s…  \
  Source • arXiv cs.LG • 16:25
Backdoor Attacks on Transformers for Tabular Data: An Empirical Study  \
  Deep Neural Networks (DNNs) have shown great promise in various domains.However, vulnerabilities associated with DNN training, such as backdoorattacks, are a significant concern. These attacks involve the subtle insertionof triggers during…  \
  Source • arXiv cs.LG • 13:23
Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause  Frequencies  \
  While large language models (LLMs) achieve strong performance on text-to-SQLparsing, they sometimes exhibit unexpected failures in which they areconfidently incorrect. Building trustworthy text-to-SQL systems thus requireseliciting reliabl…  \
  Source • arXiv cs.CL • 19:39

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G: