GenAI Daily for Practitioners — 20 Mar 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise executive summaries in 5-7 bullets for each news item: • LMEB: Long-horizon Memory Embedding Benchmark • + Introduced a benchmark for evaluating long-horizon memory embedding models. • + Tested on 10 datasets, achieving state-of-the-art results on 8. • + No cost or compliance notes. • From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models • + Proposed new efficiency metrics for vision-language-action models.
Research
- LMEB: Long-horizon Memory Embedding Benchmark \ Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' abil… \ Source • arXiv cs.CL • 09:59
- From Inference Efficiency to Embodied Efficiency: Revisiting Efficiency Metrics for Vision-Language-Action Models \ Vision-Language-Action (VLA) models have recently enabled embodied agents to perform increasingly complex tasks by jointly reasoning over visual, linguistic, and motor modalities. However, we find that the prevailing notion of ``efficiency… \ Source • arXiv cs.LG • 17:49
- VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models \ Large language models frequently exhibit suboptimal performance on low resource languages, primarily due to inefficient subword segmentation and systemic training data imbalances. In this paper, we propose Variable Entropy Policy Optimizat… \ Source • arXiv cs.CL • 18:10
- Milco: Learned Sparse Retrieval Across Languages via a Multilingual Connector \ Learned Sparse Retrieval (LSR) combines the efficiency of bi-encoders with the transparency of lexical matching, but existing approaches struggle to scale beyond English. We introduce MILCO, an LSR architecture that maps queries and docume… \ Source • arXiv cs.CL • 18:00
- A Comparative Empirical Study of Catastrophic Forgetting Mitigation in Sequential Task Adaptation for Continual Natural Language Processing Systems \ Neural language models deployed in real-world applications must continually adapt to new tasks and domains without forgetting previously acquired knowledge. This work presents a comparative empirical study of catastrophic forgetting mitiga… \ Source • arXiv cs.CL • 10:05
- myMNIST: Benchmark of PETNN, KAN, and Classical Deep Learning Models for Burmese Handwritten Digit Recognition \ We present the first systematic benchmark on myMNIST (formerly BHDD), a publicly available Burmese handwritten digit dataset important for Myanmar NLP/AI research. We evaluate eleven architectures spanning classical deep learning models (M… \ Source • arXiv cs.CL • 09:10
- SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware Limits \ As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to hardware-efficient execution. We present… \ Source • arXiv cs.LG • 18:30
- TS-Haystack: A Multi-Scale Retrieval Benchmark for Time Series Language Models \ Time Series Language Models (TSLMs) are emerging as unified models for reasoning over continuous signals in natural language. However, long-context retrieval remains a major limitation: existing models are typically trained and evaluated o… \ Source • arXiv cs.LG • 16:39
- Assessing the Distributional Fidelity of Synthetic Chest X-rays using the Embedded Characteristic Score \ Chest X-ray (CXR) images are among the most commonly used diagnostic imaging modalities in clinical practice. Stringent privacy constraints often limit the public dissemination of patient CXR images, contributing to the increasing use of s… \ Source • arXiv cs.LG • 15:20
- F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World \ We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supp… \ Source • arXiv cs.CL • 18:59
- Enhancing Lexicon-Based Text Embeddings with Large Language Models \ Recent large language models (LLMs) have demonstrated exceptional performance on general-purpose text embedding tasks. While dense embeddings have dominated related research, we introduce the first lexicon-based embeddings (LENS) leveragin… \ Source • arXiv cs.CL • 18:55
- Steering Awareness: Detecting Activation Steering from Within \ Activation steering -- adding a vector to a model's residual stream to modify its behavior -- is widely used in safety evaluations as if the model cannot detect the intervention. We test this assumption, introducing steering awareness: a m… \ Source • arXiv cs.CL • 18:37
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.
Don't miss what's next. Subscribe to Richard G: