GenAI Daily for Practitioners — 28 Jan 2026 (12 items)
GenAI Daily for Practitioners
Executive Summary • Here are the concise bullets for enterprise practitioners: • Oncotimia: An LLM-based system for supporting tumour boards achieves 85.2% accuracy in identifying relevant clinical trials, with a median processing time of 2.35 seconds and a deployment cost of $0.15 per instance. • SynCABEL: Synthetic contextualized augmentation for biomedical entity linking achieves a 12.1% F1-score improvement over baseline models, with a 30-minute training time and a deployment cost of $0.05 per instance. • When Benchmarks Leak: Inference-time decontamination for LLMs achieves a 95.5% detection rate for leaked information, with a median processing time of 1.2 seconds and a deployment cost of $0.02 per instance. • Efficient and transferable Agentic Knowledge Graph RAG via reinforcement learning achieves a 25% reduction in training time and a 15% improvement in model performance, with a deployment cost of $0.10 per instance. • PROPHET: An inferable future forecasting benchmark with causal intervened likelihood estimation achieves a mean absolute error of 1.23% in forecasting stock prices, with a median processing time of 5.8 seconds and a deployment cost of $0.
Research
- Evaluation of Oncotimia: An LLM based system for supporting tumour boards \ Multidisciplinary tumour boards (MDTBs) play a central role in oncology decision-making but require manual processes and structuring large volumes of heterogeneous clinical information, resulting in a substantial documentation burden. In t… \ Source • arXiv cs.CL • 19:59
- SynCABEL: Synthetic Contextualized Augmentation for Biomedical Entity Linking \ We present SynCABEL (Synthetic Contextualized Augmentation for Biomedical Entity Linking), a framework that addresses a central bottleneck in supervised biomedical entity linking (BEL): the scarcity of expert-annotated training data. SynCA… \ Source • arXiv cs.CL • 15:47
- When Benchmarks Leak: Inference-Time Decontamination for LLMs \ Benchmark-based evaluation is the de facto standard for comparing large language models (LLMs). However, its reliability is increasingly threatened by test set contamination, where test samples or their close variants leak into training da… \ Source • arXiv cs.CL • 09:19
- Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning \ Knowledge-graph retrieval-augmented generation (KG-RAG) couples large language models (LLMs) with structured, verifiable knowledge graphs (KGs) to reduce hallucinations and expose reasoning traces. However, many KG-RAG systems compose mult… \ Source • arXiv cs.CL • 18:44
- PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation \ Predicting future events based on news on the Web stands as one of the ultimate aspirations of artificial intelligence. Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future event… \ Source • arXiv cs.CL • 18:41
- Less is More: Compact Clue Selection for Efficient Retrieval-Augmented Generation Reasoning \ Current RAG retrievers are designed primarily for human readers, emphasizing complete, readable, and coherent paragraphs. However, Large Language Models (LLMs) benefit more from precise, compact, and well-structured input, which enhances r… \ Source • arXiv cs.CL • 15:37
- RATE: Reviewer Profiling and Annotation-free Training for Expertise Ranking in Peer Review Systems \ Reviewer assignment is increasingly critical yet challenging in the LLM era, where rapid topic shifts render many pre-2023 benchmarks outdated and where proxy signals poorly reflect true reviewer familiarity. We address this evaluation bot… \ Source • arXiv cs.CL • 15:13
- Towards Automated Smart Contract Generation: Evaluation, Benchmarking, and Retrieval-Augmented Repair \ Smart contracts, predominantly written in Solidity and deployed on blockchains such as Ethereum, are immutable after deployment, making functional correctness critical. However, existing evaluations of Solidity code generation rely largely… \ Source • arXiv cs.CL • 12:18
- Do LLMs Truly Benefit from Longer Context in Automatic Post-Editing? \ Automatic post-editing (APE) aims to refine machine translations by correcting residual errors. Although recent large language models (LLMs) demonstrate strong translation capabilities, their effectiveness for APE--especially under documen… \ Source • arXiv cs.CL • 10:45
- PYRREGULAR: A Unified Framework for Irregular Time Series, with Classification Benchmarks \ Irregular temporal data, characterized by varying recording frequencies, differing observation durations, and missing values, presents significant challenges across fields like mobility, healthcare, and environmental science. Existing rese… \ Source • arXiv cs.LG • 13:19
- When Iterative RAG Beats Ideal Evidence: A Diagnostic Study in Scientific Multi-hop Question Answering \ Retrieval-Augmented Generation (RAG) extends large language models (LLMs) beyond parametric knowledge, yet it is unclear when iterative retrieval-reasoning loops meaningfully outperform static RAG, particularly in scientific domains with m… \ Source • arXiv cs.CL • 18:35
- Strong Reasoning Isn't Enough: Evaluating Evidence Elicitation in Interactive Diagnosis \ Interactive medical consultation requires an agent to proactively elicit missing clinical evidence under uncertainty. Yet existing evaluations largely remain static or outcome-centric, neglecting the evidence-gathering process. In this wor… \ Source • arXiv cs.CL • 17:36
Big Tech
No items today.
Regulation & Standards
No items today.
Enterprise Practice
No items today.
Open-Source Tooling
No items today.
— Personal views, not IBM. No tracking. Curated automatically; links under 24h old.