GenAI Daily for Practitioners — 14 Apr 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Synthius-Mem achieves 94.4% memory accuracy and 99.6% adversarial robustness on LoCoMo using brain-inspired persona memory, potentially improving AI's ability to recall and generate human-like responses. • AtlasKV demonstrates the feasibility of augmenting large language models with billion-scale knowledge graphs in 20GB VRAM, enabling more informed and accurate AI responses. • Retrieval and generation are sufficient for conversational agents to remember, reducing the need for complex memory architectures and potential AI development costs. • A new method decomposes and reduces hidden measurement error in LLM evaluation pipelines, improving the accuracy and reliability of AI model assessments. • METER evaluates multi-level contextual causal reasoning in large language models, providing a framework for understanding and improving AI decision-making processes. • Inference plays a crucial role in dual-encoder vision-language models, impacting their ability to compose and generate accurate representations of visual and textual data.
Research
- Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo \ Providing AI agents with reliable long-term memory that does not hallucinate remains an open problem. Current approaches to memory for LLM agents -- sliding windows, summarization, embedding-based RAG, and flat fact extraction -- each redu… \ Source • arXiv cs.CL • 16:47
- AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM \ Retrieval-augmented generation (RAG) has shown some success in augmenting large language models (LLMs) with external knowledge. However, as a non-parametric knowledge integration paradigm for LLMs, RAG methods heavily rely on external retr… \ Source • arXiv cs.CL • 19:45
- Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation \ Existing conversational memory systems rely on complex hierarchical summarization or reinforcement learning to manage long-term dialogue history, yet remain vulnerable to context dilution as conversations grow. In this work, we offer a dif… \ Source • arXiv cs.CL • 17:38
- Decomposing and Reducing Hidden Measurement Error in LLM Evaluation Pipelines \ LLM evaluations drive which models get deployed, which safety standards get adopted, and which research conclusions get published. Yet these scores carry hidden uncertainty: rephrasing the prompt, switching the judge model, or changing the… \ Source • arXiv cs.CL • 16:58
- METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models \ Contextual causal reasoning is a critical yet challenging capability for Large Language Models (LLMs). Existing benchmarks, however, often evaluate this skill in fragmented settings, failing to ensure context consistency or cover the full … \ Source • arXiv cs.CL • 16:07
- Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference \ Dual-encoder Vision-Language Models (VLMs) such as CLIP are often characterized as bag-of-words systems due to their poor performance on compositional benchmarks. We argue that this limitation may stem less from deficient representations t… \ Source • arXiv cs.CL • 16:03
- Both Ends Count! Just How Good are LLM Agents at "Text-to-Big SQL"? \ Text-to-SQL and Big Data are both extensively benchmarked fields, yet there is limited research that evaluates them jointly. In the real world, Text-to-SQL systems are often embedded with Big Data workflows, such as large-scale data proces… \ Source • arXiv cs.CL • 15:29
- Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning \ We revisit retrieval-augmented generation (RAG) by embedding retrieval control directly into generation. Instead of treating retrieval as an external intervention, we express retrieval decisions within token-level decoding, enabling end-to… \ Source • arXiv cs.CL • 14:53
- Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning \ In recent years, rapid advances in Multimodal Large Language Models (MLLMs) have increasingly stimulated research on ancient Chinese scripts. As the evolution of written characters constitutes a fundamental pathway for understanding cultur… \ Source • arXiv cs.CL • 13:00
- Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation \ Large language models (LLMs) still struggle with multi-hop reasoning over knowledge-graphs (KGs), and we identify a previously overlooked structural reason for this difficulty: Transformer attention heads naturally specialize in distinct s… \ Source • arXiv cs.CL • 11:34
- Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions \ Using behavioural science, health interventions focus on behaviour change by providing a framework to help patients acquire and maintain healthy habits that improve medical outcomes. In-person interventions are costly and difficult to scal… \ Source • arXiv cs.LG • 19:05
- Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees \ Automatic prompt optimization (APO) hinges on the quality of its evaluation signal, yet scoring every prompt candidate on the full training set is prohibitively expensive. Existing methods either fix a single evaluation subset before optim… \ Source • arXiv cs.LG • 13:31
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.