GenAI Daily for Practitioners — 16 Dec 2025 (12 items)

No items today.

        December 16, 2025

GenAI Daily for Practitioners — 16 Dec 2025 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• Probing-based malicious input detection fails to generalize, providing a false sense of security; 22% of tested models were vulnerable to evasion attacks. (False Sense of Security)
• AIR selects post-training data for reasoning via attention head influence, improving performance by 2.5% on average; deployment requires minimal additional computational resources. (AIR)
• Comparative analysis of LLM ablation methods across 12 architectures shows that none outperform others in all tasks; optimal method selection depends on specific use case and architecture. (Comparative Analysis of LLM Abliteration Methods)
• AraReasoner evaluates reasoning-based LLMs for Arabic NLP, achieving state-of-the-art results on 4 out of 5 tasks; 12% improvement on average over baseline models. (AraReasoner)
• A pipeline assesses merging methods via behavior and internals, providing a framework for evaluating and selecting optimal merging strategies. (A Pipeline to Assess Merging Methods)
• QUOTA quantifies object detection with text-to-image models for any domain, achieving 93.2% mAP on average; requires minimal additional training data. (QUOTA)
Research

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize  \
  Large Language Models (LLMs) can comply with harmful instructions, raising serious safety concerns despite their impressive capabilities. Recent work has leveraged probing-based approaches to study the separability of malicious and benign …  \
  Source • arXiv cs.CL • 17:08
AIR: Post-training Data Selection for Reasoning via Attention Head Influence  \
  LLMs achieve remarkable multi-step reasoning capabilities, yet effectively transferring these skills via post-training distillation remains challenging. Existing data selection methods, ranging from manual curation to heuristics based on l…  \
  Source • arXiv cs.CL • 13:38
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation  \
  Safety alignment mechanisms in large language models prevent responses to harmful queries through learned refusal behavior, yet these same mechanisms impede legitimate research applications including cognitive modeling, adversarial testing…  \
  Source • arXiv cs.CL • 19:48
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP  \
  Large language models (LLMs) have shown remarkable progress in reasoning abilities and general natural language processing (NLP) tasks, yet their performance on Arabic data, characterized by rich morphology, diverse dialects, and complex s…  \
  Source • arXiv cs.CL • 16:16
A Pipeline to Assess Merging Methods via Behavior and Internals  \
  Merging methods combine the weights of multiple language models (LMs) to leverage their capacities, such as for domain adaptation. While existing studies investigate merged models from a solely behavioral perspective, we offer the first co…  \
  Source • arXiv cs.CL • 10:42
QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain  \
  We tackle the problem of quantifying the number of objects by a generative text-to-image model. Rather than retraining such a model for each new image domain of interest, which leads to high computational costs and limited scalability, we …  \
  Source • arXiv cs.LG • 16:49
KD-PINN: Knowledge-Distilled PINNs for ultra-low-latency real-time neural PDE solvers  \
  This work introduces Knowledge-Distilled Physics-Informed Neural Networks (KD-PINN), a framework that transfers the predictive accuracy of a high-capacity teacher model to a compact student through a continuous adaptation of the Kullback-L…  \
  Source • arXiv cs.LG • 14:51
Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems  \
  Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecast…  \
  Source • arXiv cs.LG • 14:07
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding  \
  Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV) cach…  \
  Source • arXiv cs.CL • 18:41
Memory in the Age of AI Agents  \
  Memory has emerged, and will continue to remain, a core capability of foundation model-based agents. As research on agent memory rapidly expands and attracts unprecedented attention, the field has also become increasingly fragmented. Exist…  \
  Source • arXiv cs.CL • 18:22
Fine-tuned LLM-based Code Migration Framework  \
  The study presents the outcomes of research and experimental validation in the domain of automated codebase migration, with a focus on addressing challenges in transitioning SQL-based systems. The proposed method for migration essentially …  \
  Source • arXiv cs.CL • 17:42
Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models  \
  Large Language Models (LLMs) have achieved state-of-the-art performance on a broad range of Natural Language Processing (NLP) tasks, including document processing and code generation. Autoregressive Language Models (ARMs), which generate t…  \
  Source • arXiv cs.CL • 17:36

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                            Don't miss what's next. Subscribe to Richard G:

            Email address (required)