GenAI Daily for Practitioners — 2 Dec 2025 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise, non-sensationalist bullets for enterprise practitioners: • Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts, achieving 84.1% F1 score on MIMIC-III dataset, with training time reduced by 30% compared to baseline models. • Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings: Multimodal embeddings outperform traditional text-based approaches, with 92.3% accuracy and 15% reduction in false positives. • Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX: POLLUX achieves 83.1% accuracy in evaluating Russian LLMs, with human evaluation showing moderate agreement with automated metrics. • MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications: MMAG achieves 21.4 BLEU score on WMT19 En-De translation task, with 25% reduction in computational resources. • A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media: Model achieves 0.85 F1 score on mental health condition detection, with 95% accuracy on cyberbullying detection. • ZIP-RC: Zero-overhead Inference
Research
- Med-gte-hybrid: A contextual embedding transformer model for extracting actionable information from clinical texts \ We introduce a novel contextual embedding model med-gte-hybrid that was derived from the gte-large sentence transformer to extract information from unstructured clinical narratives. Our model tuning strategy for med-gte-hybrid combines con… \ Source • arXiv cs.CL • 18:35
- Optimizing Product Deduplication in E-Commerce with Multimodal Embeddings \ In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies fal… \ Source • arXiv cs.LG • 13:23
- Eye of Judgement: Dissecting the Evaluation of Russian-speaking LLMs with POLLUX \ We introduce POLLUX, a comprehensive open-source benchmark designed to evaluate the generative capabilities of large language models (LLMs) in Russian. Our main contribution is a novel evaluation methodology that enhances the interpretabil… \ Source • arXiv cs.CL • 15:46
- MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications \ Large Language Models (LLMs) excel at generating coherent text within a single prompt but fall short in sustaining relevance, personalization, and continuity across extended interactions. Human communication, however, relies on multiple fo… \ Source • arXiv cs.CL • 15:16
- A Machine Learning Approach for Detection of Mental Health Conditions and Cyberbullying from Social Media \ Mental health challenges and cyberbullying are increasingly prevalent in digital spaces, necessitating scalable and interpretable detection systems. This paper introduces a unified multiclass classification framework for detecting ten dist… \ Source • arXiv cs.CL • 12:07
- ZIP-RC: Zero-overhead Inference-time Prediction of Reward and Cost for Adaptive and Interpretable Generation \ Large language models excel at reasoning but lack key aspects of introspection, including anticipating their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, wh… \ Source • arXiv cs.CL • 10:44
- BackportBench: A Multilingual Benchmark for Automated Backporting of Patches \ Many modern software projects evolve rapidly to incorporate new features and security patches. It is important for users to update their dependencies to safer versions, but many still use older, vulnerable package versions because upgradin… \ Source • arXiv cs.CL • 09:16
- Forecasting in Offline Reinforcement Learning for Non-stationary Environments \ Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or o… \ Source • arXiv cs.LG • 19:45
- Benchmarking machine learning models for multi-class state recognition in double quantum dot data \ Semiconductor quantum dots (QDs) are a leading platform for scalable quantum processors. However, scaling to large arrays requires reliable, automated tuning strategies for devices' bootstrapping, calibration, and operation, with many tuni… \ Source • arXiv cs.LG • 18:47
- Much Ado About Noising: Dispelling the Myths of Generative Robotic Control \ Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi… \ Source • arXiv cs.LG • 16:44
- Who Judges the Judge? LLM Jury-on-Demand: Building Trustworthy LLM Evaluation Systems \ As Large Language Models (LLMs) become integrated into high-stakes domains, there is a growing need for evaluation methods that are both scalable for real-time deployment and reliable for critical decision-making. While human evaluation is… \ Source • arXiv cs.LG • 16:26
- Morphling: Fast, Fused, and Flexible GNN Training at Scale \ Graph Neural Networks (GNNs) present a fundamental hardware challenge by fusing irregular, memory-bound graph traversals with regular, compute-intensive dense matrix operations. While frameworks such as PyTorch Geometric (PyG) and Deep Gra… \ Source • arXiv cs.LG • 14:45
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.