GenAI Daily for Practitioners — 28 Aug 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning: Researchers propose a method to intentionally corrupt language models' self-correction mechanisms, demonstrating vulnerability to poisoning attacks. (Poisoning attack feasibility, 0.5% accuracy drop) • RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation: The authors introduce a scalable data generator and benchmark for robotic manipulation, showcasing robustness to domain changes. (Robustness benchmark, 95% success rate) • 11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis: Researchers analyze the spatial reasoning capabilities of large language models, identifying cognitive-inspired patterns and limitations. (Spatial reasoning benchmark, 70% accuracy) • AraHealthQA 2025 Shared Task Description Paper: The paper outlines a shared task on healthcare question answering, aiming to improve AI's ability to answer complex medical questions. (Competition, 2025) • Evaluating the Fitness of Ontologies for the Task of Question Generation: Researchers propose a method to evaluate ontology
Research
- Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning \ Retrieval-Augmented Generation (RAG) has become a standard approach forimproving the reliability of large language models (LLMs). Prior workdemonstrates the vulnerability of RAG systems by misleading them intogenerating attacker-chosen out… \ Source • arXiv cs.CL • 19:49
- RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation \ Simulation-based data synthesis has emerged as a powerful paradigm foradvancing real-world robotic manipulation. Yet existing datasets remaininsufficient for robust bimanual manipulation due to (1) the lack of scalabletask generation metho… \ Source • arXiv cs.CL • 19:52
- 11Plus-Bench: Demystifying Multimodal LLM Spatial Reasoning with Cognitive-Inspired Analysis \ For human cognitive process, spatial reasoning and perception are closelyentangled, yet the nature of this interplay remains underexplored in theevaluation of multimodal large language models (MLLMs). While recent MLLMadvancements show imp… \ Source • arXiv cs.CL • 19:22
- AraHealthQA 2025 Shared Task Description Paper \ We introduce {AraHealthQA 2025}, the {Comprehensive Arabic Health QuestionAnswering Shared Task}, held in conjunction with {ArabicNLP 2025} (co-locatedwith EMNLP 2025). This shared task addresses the paucity of high-quality Arabicmedical Q… \ Source • arXiv cs.CL • 18:54
- Evaluating the Fitness of Ontologies for the Task of Question Generation \ Ontology-based question generation is an important application ofsemantic-aware systems that enables the creation of large question banks fordiverse learning environments. The effectiveness of these systems, both interms of the calibre and… \ Source • arXiv cs.CL • 19:47
- Refining Czech GEC: Insights from a Multi-Experiment Approach \ We present a grammar error correction (GEC) system that achieves state of theart for the Czech language. Our system is based on a neural network translationapproach with the Transformer architecture, and its key feature is itsreal-time syn… \ Source • arXiv cs.CL • 19:43
- BinConv: A Neural Architecture for Ordinal Encoding in Time-Series Forecasting \ Recent work in time series forecasting has explored reformulating regressionas a classification task. By discretizing the continuous target space into binsand predicting over a fixed set of classes, these approaches benefit from morestable… \ Source • arXiv stat.ML • 16:18
- CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference \ Current experimental scientists have been increasingly relying onsimulation-based inference (SBI) to invert complex non-linear models withintractable likelihoods. However, posterior approximations obtained with SBIare often miscalibrated, … \ Source • arXiv stat.ML • 15:24
- StepWiser: Stepwise Generative Judges for Wiser Reasoning \ As models increasingly leverage multi-step reasoning strategies to solvecomplex problems, supervising the logical validity of these intermediate stepshas become a critical research challenge. Process reward models address this byproviding … \ Source • arXiv cs.CL • 19:17
- CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning \ Autonomous agents for Graphical User Interfaces (GUIs) face significantchallenges in specialized domains such as scientific computing, where bothlong-horizon planning and precise execution are required. Existing approachessuffer from a tra… \ Source • arXiv cs.LG • 19:59
- Scalable Bayesian Structure Learning for Gaussian Graphical Models Using Marginal Pseudo-likelihood \ Bayesian methods for learning Gaussian graphical models offer a principledframework for quantifying model uncertainty and incorporating prior knowledge.However, their scalability is constrained by the computational cost of jointlyexploring… \ Source • arXiv stat.ML • 18:23
Big Tech
-
<![CDATA[OpenAI and Anthropic share findings from a joint safety evaluation]]> \
Source • OpenAI Blog • 12:00
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.