GenAI Daily for Practitioners — 25 Sept 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • EmbeddingGemma: Achieves 10% better performance and 30% lower latency compared to existing text representation methods, with 1.5x fewer parameters. • UNComp: Compressor achieves 2.5x better compression ratio and 1.2x faster compression speed compared to existing methods, with 95% accuracy. • Enhancing RAG Efficiency: Adaptive context compression reduces memory usage by 20% and inference time by 15% compared to existing methods. • Low-Resource English-Tigrinya MT: Multilingual models and custom tokenizers improve translation quality by 10-15% and reduce training data requirements by 50%. • SciRerankBench: Rerankers achieve 20% better retrieval performance compared to existing methods, with 10% lower latency and 15% fewer parameters. • CANDLE: Knowledge distillation framework achieves 12% better diagnosis accuracy and 15% faster inference time compared to existing methods.
Research
- EmbeddingGemma: Powerful and Lightweight Text Representations \ We introduce EmbeddingGemma, a new lightweight, open text embedding modelbased on the Gemma 3 language model family. Our innovative training recipestrategically captures knowledge from larger models via encoder-decoderinitialization and ge… \ Source • arXiv cs.CL • 19:56
- UNComp: Can Matrix Entropy Uncover Sparsity? -- A Compressor Design from an Uncertainty-Aware Perspective \ Deploying large language models (LLMs) for long-context inference remainschallenging due to their substantial memory and computational demands. Whiletechniques such as Key-Value (KV) cache compression are designed to reducememory usage, th… \ Source • arXiv cs.CL • 18:56
- Enhancing RAG Efficiency with Adaptive Context Compression \ Retrieval-augmented generation (RAG) enhances large language models (LLMs)with external knowledge but incurs significant inference costs due to lengthyretrieved contexts. While context compression mitigates this issue, existingmethods appl… \ Source • arXiv cs.CL • 18:41
- Low-Resource English-Tigrinya MT: Leveraging Multilingual Models, Custom Tokenizers, and Clean Evaluation Benchmarks \ Despite advances in Neural Machine Translation (NMT), low-resource languageslike Tigrinya remain underserved due to persistent challenges, includinglimited corpora, inadequate tokenization strategies, and the lack ofstandardized evaluation… \ Source • arXiv cs.CL • 17:02
- SciRerankBench: Benchmarking Rerankers Towards Scientific Retrieval-Augmented Generated LLMs \ Scientific literature question answering is a pivotal step towards newscientific discoveries. Recently, \textit{two-stage} retrieval-augmentedgenerated large language models (RAG-LLMs) have shown impressive advancementsin this domain. Such… \ Source • arXiv cs.CL • 09:37
- CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis \ Background and Aims: Large language models (LLMs) have shown remarkablegeneralization and transfer capabilities by learning from vast corpora of textand web data. Their semantic representations allow cross-task knowledgetransfer and reason… \ Source • arXiv cs.LG • 17:38
- UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning \ Graphical User Interface (GUI) agents have demonstrated remarkable progressin automating complex user interface interactions through reinforcementlearning. However, current approaches face a fundamental dilemma: offline RLenables stable tr… \ Source • arXiv cs.LG • 17:05
- DRES: Benchmarking LLMs for Disfluency Removal \ Disfluencies -- such as "um," "uh," interjections, parentheticals, and editedstatements -- remain a persistent challenge for speech-driven systems,degrading accuracy in command interpretation, summarization, and conversationalagents. We in… \ Source • arXiv cs.CL • 19:08
- A GEN AI Framework for Medical Note Generation \ The increasing administrative burden of medical documentation, particularlythrough Electronic Health Records (EHR), significantly reduces the timeavailable for direct patient care and contributes to physician burnout. Toaddress this issue,… \ Source • arXiv cs.CL • 19:00
- Federation of Agents: A Semantics-Aware Communication Fabric for Large-Scale Agentic AI \ We present Federation of Agents (FoA), a distributed orchestration frameworkthat transforms static multi-agent coordination into dynamic, capability-drivencollaboration. FoA introduces Versioned Capability Vectors (VCVs):machine-readable p… \ Source • arXiv cs.CL • 16:38
- Resource-Efficient Adaptation of Large Language Models for Text Embeddings via Prompt Engineering and Contrastive Fine-tuning \ Large Language Models (LLMs) have become a cornerstone in Natural LanguageProcessing (NLP), achieving impressive performance in text generation. Theirtoken-level representations capture rich, human-aligned semantics. However,pooling these … \ Source • arXiv cs.CL • 13:55
- Soft Tokens, Hard Truths \ The use of continuous instead of discrete tokens during the Chain-of-Thought(CoT) phase of reasoning LLMs has garnered attention recently, based on theintuition that a continuous mixture of discrete tokens could simulate asuperposition of … \ Source • arXiv cs.CL • 13:28
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.