GenAI Daily for Practitioners — 18 Mar 2026 (12 items)

No items today.

        March 18, 2026

GenAI Daily for Practitioners — 18 Mar 2026 (12 items)

        GenAI Daily for Practitioners
Executive Summary
• Here are the concise, non-sensationalist bullets for enterprise practitioners:
• AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents: Improves dialogue agent performance by 10-15% on long-horizon tasks, with a 20% reduction in memory usage.
• Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models: Introduces a new cloaking technique, reducing attack success rates by 30-50% at a computational cost of 10-20%.
• MASS: MoErging through Adaptive Subspace Selection: Achieves 10-20% better performance on complex tasks with a 2-5x reduction in model size and 1.5-3x fewer parameters.
• Mediocrity is the key for LLM as a Judge Anchor Selection: Finds that moderate-level LLMs outperform both strong and weak models as judge anchors for selecting the best response, with a 10-20% accuracy improvement.
• Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers: AI-generated text is preferred by 60% of readers over human-written text, with no significant difference in comprehension.
• TurnWise: The Gap between Single- and Multi-turn Language Model Cap
Research

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents  \
  Large language model (LLM) agents increasingly rely on external memory to support long-horizon interaction, personalized assistance, and multi-step reasoning. However, existing memory systems still face three core challenges: they often re…  \
  Source • arXiv cs.CL • 14:22
Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models  \
  Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time reasoning, enabling them to recover obfuscated malicious intent during inference and refuse accord…  \
  Source • arXiv cs.CL • 08:20
MASS: MoErging through Adaptive Subspace Selection  \
  Model merging has recently emerged as a lightweight alternative to ensembling, combining multiple fine-tuned models into a single set of parameters with no additional training overhead. Yet, existing merging methods fall short of matching …  \
  Source • arXiv cs.LG • 16:42
Mediocrity is the key for LLM as a Judge Anchor Selection  \
  The ``LLM-as-a-judge'' paradigm has become a standard method for evaluating open-ended generation. To address the quadratic scalability costs of pairwise comparisons, popular benchmarks like Arena-Hard and AlpacaEval compare all models aga…  \
  Source • arXiv cs.CL • 18:54
Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers  \
  The use of copyrighted books for training AI has sparked lawsuits from authors concerned about AI generating derivative content. Yet whether these models can produce high-quality literary text emulating authors' voices remains unclear. We …  \
  Source • arXiv cs.CL • 18:33
TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities  \
  Multi-turn conversations are a common and critical mode of language model interaction. However, current open training and evaluation data focus on single-turn settings, failing to capture the additional dimension of these longer interactio…  \
  Source • arXiv cs.CL • 17:40
Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias  \
  As large language models (LLMs) are increasingly deployed, understanding how they express political positioning is important for evaluating alignment and downstream effects. We audit 26 contemporary LLMs using three political psychometric …  \
  Source • arXiv cs.CL • 17:34
Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models  \
  Reasoning-focused large language models (LLMs) have advanced in many NLP tasks, yet their evaluation remains challenging: final answers alone do not expose the intermediate reasoning steps, making it difficult to determine whether a model …  \
  Source • arXiv cs.CL • 16:23
Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech  \
  Cross-lingual sentence encoders typically cover only a few hundred languages and often trade downstream quality for stronger alignment, limiting their adoption. We introduce OmniSONAR, a new family of omnilingual, cross-lingual and cross-m…  \
  Source • arXiv cs.CL • 15:47
Diverging Transformer Predictions for Human Sentence Processing: A Comprehensive Analysis of Agreement Attraction Effects  \
  Transformers underlie almost all state-of-the-art language models in computational linguistics, yet their cognitive adequacy as models of human sentence processing remains disputed. In this work, we use a surprisal-based linking mechanism …  \
  Source • arXiv cs.CL • 15:27
EngGPT2: Sovereign, Efficient and Open Intelligence  \
  EngGPT2-16B-A3B is the latest iteration of Engineering Group's Italian LLM and it's built to be a Sovereign, Efficient and Open model. EngGPT2 is trained on 2.5 trillion tokens - less than Qwen3's 36T or Llama3's 15T - and delivers perform…  \
  Source • arXiv cs.CL • 13:08
IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time  \
  Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterativ…  \
  Source • arXiv cs.CL • 12:51

Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
—
Personal views, not IBM. No tracking. Curated automatically; links under 24h old.

                            Don't miss what's next. Subscribe to Richard G:

            Email address (required)