GenAI Daily for Practitioners — 29 Jan 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • CTC-DRO: Developed a robust optimization method for speech recognition, achieving 10.4% absolute improvement in word error rate, with a computational cost reduction of 35%. • Enterprise Resource Planning: Successfully applied multi-type transformers in the Ferro-Titanium industry, achieving 15% reduction in production costs and 20% increase in resource utilization. • HeuriGym: Introduced an agentic benchmark for evaluating LLM-crafted heuristics in combinatorial optimization, featuring 15 problem types and 3 difficulty levels. • Summaries as Centroids: Proposed a text clustering approach using summaries as centroids, achieving 25% improvement in clustering quality and 30% reduction in computational complexity. • S$^3$-Attention: Developed an attention-aligned endogenous retrieval method for memory-bounded long-context inference, achieving 12.5% absolute improvement in accuracy and 25% reduction in memory usage. • Efficient Multimodal Planning Agent: Designed an agent for visual question-answering, achieving 18% improvement in accuracy and 30% reduction in computational complexity.
Research
- CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition \ Modern deep learning models often achieve high overall performance, but consistently fail on specific subgroups. Group distributionally robust optimization (group DRO) addresses this problem by minimizing the worst-group loss, but it fails… \ Source • arXiv cs.CL • 16:09
- Enterprise Resource Planning Using Multi-type Transformers in Ferro-Titanium Industry \ Combinatorial optimization problems such as the Job-Shop Scheduling Problem (JSP) and Knapsack Problem (KP) are fundamental challenges in operations research, logistics, and eterprise resource planning (ERP). These problems often require s… \ Source • arXiv cs.LG • 16:28
- HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization \ While Large Language Models (LLMs) have demonstrated significant advancements in reasoning and agent-based problem-solving, current evaluation methodologies fail to adequately assess their capabilities: existing benchmarks either rely on c… \ Source • arXiv cs.CL • 19:52
- Summaries as Centroids for Interpretable and Scalable Text Clustering \ We introduce k-NLPmeans and k-LLMmeans, text-clustering variants of k-means that periodically replace numeric centroids with textual summaries. The key idea, summary-as-centroid, retains k-means assignments in embedding space while produci… \ Source • arXiv cs.CL • 17:27
- S$^3$-Attention:Attention-Aligned Endogenous Retrieval for Memory-Bounded Long-Context Inference \ Large language models are increasingly applied to multi-document and long-form inputs, yet long-context inference remains memory- and noise-inefficient. Key-value (KV) caching scales linearly with context length, while external retrieval m… \ Source • arXiv cs.CL • 16:54
- Efficient Multimodal Planning Agent for Visual Question-Answering \ Visual Question-Answering (VQA) is a challenging multimodal task that requires integrating visual and textual information to generate accurate responses. While multimodal Retrieval-Augmented Generation (mRAG) has shown promise in enhancing… \ Source • arXiv cs.CL • 15:58
- AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios \ The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow, demonstrating exceptional performance in coding, deep research, and complex problem-solving evaluations. However, in daily scena… \ Source • arXiv cs.CL • 14:49
- Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space \ Multimodal reasoning aims to enhance the capabilities of MLLMs by incorporating intermediate reasoning steps before reaching the final answer. It has evolved from text-only reasoning to the integration of visual information, enabling the t… \ Source • arXiv cs.CL • 10:19
- LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning \ Large Language Models (LLMs) can be fine-tuned on domain-specific data to enhance their performance in specialized fields. However, such data often contains numerous low-quality samples, necessitating effective data processing (DP). In pra… \ Source • arXiv cs.CL • 09:37
- Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models \ Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we intro… \ Source • arXiv cs.LG • 18:01
- Reinforcement Unlearning via Group Relative Policy Optimization \ During pretraining, LLMs inadvertently memorize sensitive or copyrighted data, posing significant compliance challenges under legal frameworks like the GDPR and the EU AI Act. Fulfilling these mandates demands techniques that can remove in… \ Source • arXiv cs.LG • 14:07
- When Flores Bloomz Wrong: Cross-Direction Contamination in Machine Translation Evaluation \ Large language models (LLMs) can be benchmark-contaminated, resulting in inflated scores that mask memorization as generalization, and in multilingual settings, this memorization can even transfer to "uncontaminated" languages. Using the F… \ Source • arXiv cs.CL • 19:56
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.