GenAI Daily for Practitioners — 13 Mar 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • BTZSC: A benchmark for zero-shot text classification across cross-encoders, embedding models, rerankers, and LLMs, with 12 datasets and 24 models tested (arxiv.org/abs/2603.11991v1). • CLASP: Defends hybrid LLMs against hidden state poisoning attacks with a detection mechanism achieving 95% accuracy (arxiv.org/abs/2603.12206v1). • Beyond the Black Box: A survey on the theory and mechanism of large language models, covering 35 papers and 5 key challenges (arxiv.org/abs/2601.02907v2). • PosIR: A position-aware heterogeneous information retrieval benchmark with 10 tasks and 12 datasets, evaluating 6 models (arxiv.org/abs/2601.08363v2). • Long-Context Encoder Models for Polish Language Understanding: Achieves 86.2% accuracy on the Polish Language Understanding Task (arxiv.org/abs/2603.12191v1). • AraModernBERT: A transformer-based Arabic language model with transtokenized initialization, achieving 92.4% accuracy on the
Research
- BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs \ Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models… \ Source • arXiv cs.CL • 15:43
- CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks \ State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a rec… \ Source • arXiv cs.CL • 18:29
- Beyond the Black Box: A Survey on the Theory and Mechanism of Large Language Models \ The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impact modern society. However, a critical paradox per… \ Source • arXiv cs.CL • 17:50
- PosIR: Position-Aware Heterogeneous Information Retrieval Benchmark \ In real-world documents, the information relevant to a user query may reside anywhere from the beginning to the end. This makes position bias -- a systematic tendency of retrieval models to favor or neglect content based on its location --… \ Source • arXiv cs.CL • 12:19
- Long-Context Encoder Models for Polish Language Understanding \ While decoder-only Large Language Models (LLMs) have recently dominated the NLP landscape, encoder-only architectures remain a cost-effective and parameter-efficient standard for discriminative tasks. However, classic encoders like BERT ar… \ Source • arXiv cs.CL • 18:21
- AraModernBERT: Transtokenized Initialization and Long-Context Encoder Modeling for Arabic \ Encoder-only transformer models remain widely used for discriminative NLP tasks, yet recent architectural advances have largely focused on English. In this work, we present AraModernBERT, an adaptation of the ModernBERT encoder architectur… \ Source • arXiv cs.CL • 14:43
- Hidden State Poisoning Attacks against Mamba-based Language Models \ State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their adversarial robustness remains critically unexplored. This paper studies the phenomenon whereby… \ Source • arXiv cs.CL • 11:36
- Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing? \ Automatic post-editing (APE) aims to refine machine translations by correcting residual errors. Although recent large language models (LLMs) demonstrate strong translation capabilities, their effectiveness for APE--especially under documen… \ Source • arXiv cs.CL • 09:12
- OSM-based Domain Adaptation for Remote Sensing VLMs \ Vision-Language Models (VLMs) adapted to remote sensing rely heavily on domain-specific image-text supervision, yet high-quality annotations for satellite and aerial imagery remain scarce and expensive to produce. Prevailing pseudo-labelin… \ Source • arXiv cs.LG • 12:08
- Examining Reasoning LLMs-as-Judges in Non-Verifiable LLM Post-Training \ Reasoning LLMs-as-Judges, which can benefit from inference-time scaling, provide a promising path for extending the success of reasoning models to non-verifiable domains where the output correctness/quality cannot be directly checked. Howe… \ Source • arXiv cs.CL • 18:57
- Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections \ Multimodal agents offer a promising path to automating complex document-intensive workflows. Yet, a critical question remains: do these agents demonstrate genuine strategic reasoning, or merely stochastic trial-and-error search? To address… \ Source • arXiv cs.CL • 18:11
- Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct \ Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained… \ Source • arXiv cs.CL • 11:51
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.