Microsoft Quietly Rips Out Its Own Copilot Buttons
1. OpenAI Pushes Illinois Liability Shield as First Deepfake Conviction Lands An Ohio man used more than 100 AI tools to generate fake nude images of women and minors. After his arrest, he kept producing them.
2. Microsoft Quietly Rips Out Its Own Copilot Buttons Microsoft spent the last two years threading a Copilot button into nearly every Windows 11 app it shipped. Notepad got one. So did Snipping Tool. Paint and Photos followed.
3. Two Papers Undercut the "RL Generalizes, SFT Memorizes" Training Consensus Cross-domain generalization in supervised finetuning follows a U-shaped curve.
In Brief
- Hunyuan Team Releases Embodied AI Foundation Models for Physical Agents HY-Embodied-0.5 is a family of models built for robots and physical agents, targeting spatial perception, temporal reasoning, and interaction planning. The suite ships in two sizes: an efficient variant for on-device deployment and a larger one for complex multi-step tasks.
- ClawBench Tests AI Agents on 153 Everyday Online Tasks — Most Fail ClawBench evaluates AI agents on routine tasks people actually do: booking appointments, completing purchases, submitting job applications across 144 live websites in 15 categories. The benchmark exposes a wide gap between demo-grade agent performance and reliable real-world task completion.
- LPM 1.0 Generates Real-Time Video Character Performances from a Single Reference The model tackles what its authors call the "performance trilemma" — jointly achieving expressiveness, real-time inference, and long-horizon identity stability in video. LPM 1.0 focuses on conversational scenarios where characters must sustain coherent facial, vocal, and gestural behavior over extended sequences.
- Survey Maps How LLM Agent Capabilities Are Moving Outside the Model A new review paper argues that modern LLM agents gain capabilities less from weight changes and more from external scaffolding: memory stores, reusable skills, interaction protocols, and runtime harnesses. The framework draws on cognitive-artifact theory to classify what gets externalized and why.
- MegaStyle Builds 170K-Prompt Style Dataset Using Generative Model Consistency The pipeline exploits the fact that current text-to-image models produce visually consistent outputs from the same style description. MegaStyle pairs 170K style prompts with 400K content prompts to create a large, balanced dataset where intra-style consistency and inter-style diversity are both enforced.
- KnowU-Bench Measures Whether Mobile Agents Can Ask Before They Act Existing mobile-agent benchmarks test preference recovery from static histories or intent prediction from fixed contexts. KnowU-Bench instead evaluates whether an agent can identify missing preferences through dialogue and decide when to intervene, request consent, or stay silent in a live GUI.
- SkillClaw Lets Agent Skills Improve Across Users Without Retraining OpenClaw-style LLM agents rely on reusable skills that stay static after deployment, forcing users to rediscover the same workarounds independently. SkillClaw introduces a mechanism that aggregates cross-user success and failure signals to evolve shared skills collectively over time.
- NUMINA Fixes Object-Count Errors in Text-to-Video Without Additional Training Text-to-video diffusion models routinely generate the wrong number of objects. NUMINA identifies prompt-layout mismatches by selecting discriminative attention heads, derives a countable latent layout, then modulates cross-attention to correct the count — all at inference time with no fine-tuning.
Don't miss what's next. Subscribe to AI News Digest: