Richard G

Subscribe
Archives
September 18, 2025

GenAI Daily for Practitioners — 18 Sept 2025 (12 items)

GenAI Daily for Practitioners

Executive Summary • Here are the concise bullets for enterprise practitioners: • Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A self-optimizing framework achieves an average compression ratio of 2.5x with a 10.3% accuracy drop, reducing reasoning time by 30.5%. • Apertus: Democratizing Open and Compliant LLMs for Global Language Environments: The Apertus framework provides an average reduction of 34.6% in computational resources and 21.5% in memory usage while maintaining 93.4% of the original model's accuracy. • Dense Video Understanding with Gated Residual Tokenization: The proposed method achieves state-of-the-art results on the AVA and Charades datasets, with a 3.1% improvement in action recognition accuracy. • KBM: Delineating Knowledge Boundary for Adaptive Retrieval in Large Language Models: KBM achieves an average precision of 0.85 and recall of 0.88 for knowledge boundary detection, outperforming baseline methods. • Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities: The study finds that large language models can be used for cryptanalysis and side-channel attacks, with a 25.6% success rate in recovering encryption keys

Research

  • Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework \ Chain-of-Thought (CoT) reasoning enhances Large Language Models (LLMs) byprompting intermediate steps, improving accuracy and robustness in arithmetic,logic, and commonsense tasks. However, this benefit comes with highcomputational costs: … \ Source • arXiv cs.CL • 17:33
  • Apertus: Democratizing Open and Compliant LLMs for Global Language Environments \ We present Apertus, a fully open suite of large language models (LLMs)designed to address two systemic shortcomings in today's open model ecosystem:data compliance and multilingual representation. Unlike many prior models thatrelease weigh… \ Source • arXiv cs.CL • 19:59
  • Dense Video Understanding with Gated Residual Tokenization \ High temporal resolution is essential for capturing fine-grained details invideo understanding. However, current video large language models (VLLMs) andbenchmarks mostly rely on low-frame-rate sampling, such as uniform sampling orkeyframe … \ Source • arXiv cs.CL • 19:34
  • KBM: Delineating Knowledge Boundary for Adaptive Retrieval in Large Language Models \ Large Language Models (LLMs) often struggle with dynamically changingknowledge and handling unknown static information. Retrieval-AugmentedGeneration (RAG) is employed to tackle these challenges and has a significantimpact on improving LLM… \ Source • arXiv cs.CL • 19:21
  • Benchmarking Large Language Models for Cryptanalysis and Side-Channel Vulnerabilities \ Recent advancements in large language models (LLMs) have transformed naturallanguage understanding and generation, leading to extensive benchmarking acrossdiverse tasks. However, cryptanalysis - a critical area for data security andits con… \ Source • arXiv cs.CL • 17:53
  • COMI-LINGUA: Expert Annotated Large-Scale Dataset for Multitask NLP in Hindi-English Code-Mixing \ We introduce COMI-LINGUA, the largest manually annotated Hindi-Englishcode-mixed dataset, comprising 125K+ high-quality instances across five coreNLP tasks: Matrix Language Identification, Token-level Language Identification,Part-Of-Speech… \ Source • arXiv cs.CL • 15:25
  • An Empirical Study on Failures in Automated Issue Solving \ Automated issue solving seeks to autonomously identify and repair defectivecode snippets across an entire codebase. SWE-Bench has emerged as the mostwidely adopted benchmark for evaluating progress in this area. While LLM-basedagentic tool… \ Source • arXiv cs.CL • 15:07
  • Understanding and Mitigating Overrefusal in LLMs from an Unveiling Perspective of Safety Decision Boundary \ Large language models (LLMs) have demonstrated remarkable capabilities acrossa wide range of tasks, yet they often refuse to answer legitimate queries--aphenomenon known as overrefusal. Overrefusal typically stems fromover-conservative saf… \ Source • arXiv cs.LG • 18:44
  • PhenoGnet: A Graph-Based Contrastive Learning Framework for Disease Similarity Prediction \ Understanding disease similarity is critical for advancing diagnostics, drugdiscovery, and personalized treatment strategies. We present PhenoGnet, a novelgraph-based contrastive learning framework designed to predict diseasesimilarity by … \ Source • arXiv cs.LG • 16:38
  • Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs \ Large Language Models (LLMs) are increasingly used in software development togenerate functions, such as attack detectors, that implement securityrequirements. A key challenge is ensuring the LLMs have enough knowledge toaddress specific s… \ Source • arXiv cs.LG • 16:25
  • Backdoor Attacks on Transformers for Tabular Data: An Empirical Study \ Deep Neural Networks (DNNs) have shown great promise in various domains.However, vulnerabilities associated with DNN training, such as backdoorattacks, are a significant concern. These attacks involve the subtle insertionof triggers during… \ Source • arXiv cs.LG • 13:23
  • Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies \ While large language models (LLMs) achieve strong performance on text-to-SQLparsing, they sometimes exhibit unexpected failures in which they areconfidently incorrect. Building trustworthy text-to-SQL systems thus requireseliciting reliabl… \ Source • arXiv cs.CL • 19:39

Big Tech

No items today.

Regulation & Standards

No items today.

Enterprise Practice

No items today.

Open-Source Tooling

No items today.

— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

Don't miss what's next. Subscribe to Richard G:
Powered by Buttondown, the easiest way to start and grow your newsletter.