GenAI Daily for Practitioners — 18 Mar 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents: Improves dialogue agent performance by 10-15% on long-horizon tasks, with a 20% reduction in memory usage. • Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models: Introduces a new cloaking technique, reducing attack success rates by 30-50% at a computational cost of 10-20%. • MASS: MoErging through Adaptive Subspace Selection: Achieves 10-20% better performance on complex tasks with a 2-5x reduction in model size and 1.5-3x fewer parameters. • Mediocrity is the key for LLM as a Judge Anchor Selection: Finds that moderate-level LLMs outperform both strong and weak models as judge anchors for selecting the best response, with a 10-20% accuracy improvement. • Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers: AI-generated text is preferred by 60% of readers over human-written text, with no significant difference in comprehension. • TurnWise: The Gap between Single- and Multi-turn Language Model Cap
Research
- AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents \ Large language model (LLM) agents increasingly rely on external memory to support long-horizon interaction, personalized assistance, and multi-step reasoning. However, existing memory systems still face three core challenges: they often re… \ Source • arXiv cs.CL • 14:22
- Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models \ Modern LLMs employ safety mechanisms that extend beyond surface-level input filtering to latent semantic representations and generation-time reasoning, enabling them to recover obfuscated malicious intent during inference and refuse accord… \ Source • arXiv cs.CL • 08:20
- MASS: MoErging through Adaptive Subspace Selection \ Model merging has recently emerged as a lightweight alternative to ensembling, combining multiple fine-tuned models into a single set of parameters with no additional training overhead. Yet, existing merging methods fall short of matching … \ Source • arXiv cs.LG • 16:42
- Mediocrity is the key for LLM as a Judge Anchor Selection \ The ``LLM-as-a-judge'' paradigm has become a standard method for evaluating open-ended generation. To address the quadratic scalability costs of pairwise comparisons, popular benchmarks like Arena-Hard and AlpacaEval compare all models aga… \ Source • arXiv cs.CL • 18:54
- Readers Prefer Outputs of AI Trained on Copyrighted Books over Expert Human Writers \ The use of copyrighted books for training AI has sparked lawsuits from authors concerned about AI generating derivative content. Yet whether these models can produce high-quality literary text emulating authors' voices remains unclear. We … \ Source • arXiv cs.CL • 18:33
- TurnWise: The Gap between Single- and Multi-turn Language Model Capabilities \ Multi-turn conversations are a common and critical mode of language model interaction. However, current open training and evaluation data focus on single-turn settings, failing to capture the additional dimension of these longer interactio… \ Source • arXiv cs.CL • 17:40
- Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias \ As large language models (LLMs) are increasingly deployed, understanding how they express political positioning is important for evaluating alignment and downstream effects. We audit 26 contemporary LLMs using three political psychometric … \ Source • arXiv cs.CL • 17:34
- Omanic: Towards Step-wise Evaluation of Multi-hop Reasoning in Large Language Models \ Reasoning-focused large language models (LLMs) have advanced in many NLP tasks, yet their evaluation remains challenging: final answers alone do not expose the intermediate reasoning steps, making it difficult to determine whether a model … \ Source • arXiv cs.CL • 16:23
- Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech \ Cross-lingual sentence encoders typically cover only a few hundred languages and often trade downstream quality for stronger alignment, limiting their adoption. We introduce OmniSONAR, a new family of omnilingual, cross-lingual and cross-m… \ Source • arXiv cs.CL • 15:47
- Diverging Transformer Predictions for Human Sentence Processing: A Comprehensive Analysis of Agreement Attraction Effects \ Transformers underlie almost all state-of-the-art language models in computational linguistics, yet their cognitive adequacy as models of human sentence processing remains disputed. In this work, we use a surprisal-based linking mechanism … \ Source • arXiv cs.CL • 15:27
- EngGPT2: Sovereign, Efficient and Open Intelligence \ EngGPT2-16B-A3B is the latest iteration of Engineering Group's Italian LLM and it's built to be a Sovereign, Efficient and Open model. EngGPT2 is trained on 2.5 trillion tokens - less than Qwen3's 36T or Llama3's 15T - and delivers perform… \ Source • arXiv cs.CL • 13:08
- IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time \ Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterativ… \ Source • arXiv cs.CL • 12:51
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.