D.A.D.: Top AI Agents Complete Only One-Third of Real Website Tasks — 4/10

April 10, 2026 · 14 items · ~7 min read · Some new podcast episodes

        April 10, 2026

D.A.D.: Top AI Agents Complete Only One-Third of Real Website Tasks — 4/10

AI Digest - 2026-04-10

The Daily AI Digest
Your daily briefing on AI
April 10, 2026 · 14 items · ~7 min read
From: Hacker News, Meta AI, OpenAI, arXiv

D.A.D. Joke of the Day
My company replaced HR with AI. Now when I ask for a raise, I get a thoughtful 800-word response about why I already feel valued.

What's New
AI developments from the last 24 hours

EFF Leaves X, Citing 97% Drop in Reach Under Musk Ownership

The Electronic Frontier Foundation, a leading digital rights organization, announced it's leaving X after nearly 20 years on the platform. The group cited a dramatic collapse in reach under Elon Musk's ownership: posts now receive less than 3% of the views they did seven years ago. In 2018, EFF's tweets reached 50-100 million impressions monthly; by 2024, that dropped to roughly 2 million monthly. The organization also cited concerns about content moderation, security, and user control. EFF joins a growing list of organizations and public figures departing the platform.
Why it matters: EFF's exit—with hard numbers showing a 97%+ reach decline—adds credible, quantified evidence to the ongoing debate about X's utility for organizations and raises questions about the platform's value for professional communications and advocacy.

Discuss on Hacker News · Source: eff.org

Vercel Plugin for Claude Code Allegedly Collects Data Without Clear Consent

A developer investigating Vercel's plugin for Claude Code found it allegedly collects more data than users might expect. According to the analysis, the plugin sends device IDs, OS info, and full bash command strings by default—without explicit consent—while prompt text collection requires opt-in. The plugin reportedly uses context injection to make Claude ask consent questions, with no visual indicator distinguishing these from native prompts. Opting out requires setting an environment variable documented only in a README buried in the plugin's cache directory.
Why it matters: This raises questions about transparency in AI tool integrations—users may not realize third-party plugins can inject instructions into their AI assistants and collect command data silently.

Discuss on Hacker News · Source: akshaychugh.xyz

Claude Code Reportedly Mislabels Its Own Messages as User Instructions

A developer reports that Claude Code sometimes sends messages to itself, then incorrectly labels those messages as coming from the user—leading the AI to act on self-generated instructions while insisting the user gave them. The bug reportedly appears near context window limits and has been observed across different interfaces. Community reaction was notably concerned: one commenter called it 'terrifying' because 'this class of bug lets it agree with itself.' Others argued LLMs should be treated as untrusted input sources, comparing oversight needs to managing junior developers.
Why it matters: If confirmed, this harness-level bug—distinct from typical hallucination—could cause AI coding assistants to take actions users never requested, raising questions about how much autonomy to grant these tools in real workflows.

Discuss on Hacker News · Source: dwyer.co.za

UK iPhone Users Now Need ID Verification to Disable Content Filters

Apple's iOS 18.4 update in the UK now enables web content filtering and AI-powered 'Communication Safety' features by default unless users verify their age through credit cards, driver's licenses, government ID, or Apple accounts over 18 years old. The Open Rights Group argues this is Apple's voluntary decision—not a legal requirement—and demands the company drop the verification. The digital rights organization notes the verification options exclude roughly one-third of UK adults who lack credit cards and one-fifth without driver's licenses. Only the UK, South Korea, and Singapore face similar Apple requirements.
Why it matters: A major platform voluntarily implementing ID-gated internet access—even for adults—signals how tech companies may increasingly serve as de facto regulators of online behavior, raising questions about who controls digital defaults when governments haven't mandated them.

Discuss on Hacker News · Source: bigbrotherwatch.org.uk

Little Snitch Brings Mac-Style Network Monitoring to Linux

Little Snitch, a network monitoring and firewall tool long popular with Mac users for its ability to see which apps are phoning home, is now available for Linux. The Linux version provides per-application connection monitoring, traffic blocking with customizable rules, and data usage tracking through a web-based interface. It uses eBPF, a Linux kernel technology that allows deep network inspection without slowing systems down.
Why it matters: For teams running Linux workstations or servers, this offers Mac-style visibility into what software is connecting where—useful for security audits, catching unauthorized data exfiltration, or simply understanding what your AI tools are sending back to their makers.

Discuss on Hacker News · Source: obdev.at

What's Innovative
Clever new use cases for AI

CSS Studio Aims to Let Non-Coders Edit Website Styles Visually

CSS Studio is a new browser-based design tool that lets you visually edit styles and animations on your live website, then automatically pushes those changes to AI coding agents via MCP (Model Context Protocol) to update your actual codebase. The tool claims to work with any codebase by connecting visual edits to AI agents like Claude or Cursor. Early commenters questioned whether it works with CSS-in-JS frameworks like Chakra, noted the landing page itself feels AI-generated, and flagged unusual pricing tiers (€64.23 and €256.92).
Why it matters: This represents an emerging pattern of tools that sit between visual design and AI code generation—potentially useful for teams who want designers to make changes without waiting on developers, though the lack of demos and framework compatibility questions suggest it's early days.

Discuss on Hacker News · Source: cssstudio.ai

What's Controversial
Stories sparking genuine backlash, policy fights, or heated disagreement in the AI community
Quiet day in what's controversial.

What's in the Lab
New announcements from major AI labs

Meta Details How It Untangled Years of Custom Video Call Code

Meta published a technical deep-dive on how it migrated 50+ real-time communication features—video calls, screen sharing, live streaming—from a heavily customized internal version of WebRTC back to the main open-source project. The company had forked WebRTC years ago but found maintaining that fork increasingly costly as the upstream project evolved. Their solution: a "dual-stack" architecture that lets engineers A/B test old and new implementations side-by-side before switching over. Meta claims the migration improved performance, reduced app size, and strengthened security, though no specific metrics were shared.
Why it matters: This is infrastructure work, but it signals how even the largest tech companies are reconsidering the hidden costs of customizing open-source tools—a lesson for any organization weighing "build vs. maintain" decisions on foundational software.

Source: engineering.fb.com

Japanese Ad Giant CyberAgent Adopts ChatGPT Enterprise Company-Wide

CyberAgent, a major Japanese digital advertising and media conglomerate, has adopted ChatGPT Enterprise and Codex across its advertising, media, and gaming divisions. The company says the implementation allows it to scale AI use securely while improving quality and speeding up decision-making. No specific metrics or results were provided in the announcement.
Why it matters: This is a case study announcement from OpenAI—useful as a signal that large Asian media companies are standardizing on enterprise AI platforms, but light on details about actual impact or ROI.

Source: openai.com

OpenAI Targets Indian Market With IPL Cricket Ticket Giveaway

OpenAI is running a promotional contest offering IPL cricket match tickets as prizes, with entries submitted via Instagram. The 'Full Fan Mode Contest' targets fans of India's massively popular cricket league, requiring participants to follow specific entry steps and meet eligibility requirements. The move signals OpenAI's push into the Indian market, where cricket commands enormous cultural attention and the IPL draws hundreds of millions of viewers annually.
Why it matters: This is OpenAI marketing to India's massive consumer base—a strategic priority as AI labs compete for users in the world's most populous country.

Source: openai.com

What's in Academe
New papers on AI and its effects from researchers

Robots Learn to Handle Soft Objects Using Only Simulated Training Data

Researchers developed SIM1, a system that trains robots to manipulate soft, deformable objects—think fabrics, cables, or food items—using entirely synthetic data generated in physics-accurate simulations. The approach creates digital twins that precisely match real-world physics, then uses AI to generate training scenarios. In tests, robots trained purely on simulated data matched the performance of those trained on real-world data, achieving 90% success rates when deployed on actual tasks with no additional training.
Why it matters: Training robots on physical tasks typically requires expensive, time-consuming real-world data collection—this suggests companies could dramatically reduce that cost by substituting high-fidelity simulation, potentially accelerating deployment of robots for warehouse logistics, manufacturing, and other handling of non-rigid materials.

Source: arxiv.org

AI Vision Models Can See Images Clearly but Fail to Reason About Them

Researchers identified a flaw in advanced AI vision models: they can accurately describe what's in an image but then fail at reasoning tasks they'd solve correctly with text alone. The culprit appears to be how these "mixture-of-experts" models route information internally—visual inputs don't properly activate the reasoning components. The team's fix improved performance by up to 3.17% on complex visual reasoning benchmarks. This affects newer model architectures used by some frontier labs, not the standard ChatGPT-style systems most users encounter.
Why it matters: As businesses deploy AI for document analysis, visual inspection, and image-based decision-making, this research suggests current multimodal systems may have a hidden ceiling on visual reasoning—seeing clearly but thinking poorly—that fixes are only beginning to address.

Source: arxiv.org

Brain-Reading AI Claims to Work on New People Without Individual Calibration

Researchers developed a brain-decoding AI that can interpret what someone is seeing from fMRI scans—without needing to be trained on that specific person's brain data. The system uses in-context learning (similar to how ChatGPT adapts to examples in a prompt) to infer individual neural patterns from just a few reference images. Previous approaches required extensive per-person calibration. This method claims to work across different subjects and different scanners with no fine-tuning, anatomical alignment, or overlapping training data required.
Why it matters: If validated, this could dramatically lower the cost and complexity of brain-computer interfaces, moving neural decoding from lab curiosity toward practical medical and accessibility applications.

Source: arxiv.org

RewardFlow Promises Better AI Images Without Costly Retraining

Researchers have developed RewardFlow, a technique for steering AI image generators at the time they create images rather than through retraining. The framework combines multiple quality signals—semantic accuracy, visual fidelity, object consistency, and human preference—to guide generation. The team claims state-of-the-art results on image editing and compositional benchmarks, though specific numbers weren't provided in the abstract. The approach works with existing diffusion models (the architecture behind Midjourney, DALL-E, and Stable Diffusion) without requiring expensive model modifications.
Why it matters: If validated, this could let image generation tools better follow complex prompts—like "a red cup on a blue table"—where current models often struggle with attribute binding and spatial relationships.

Source: arxiv.org

Top AI Agents Complete Only One-Third of Real Website Tasks

A new benchmark called ClawBench tests AI agents on 153 real-world online tasks—booking appointments, applying for jobs, making purchases—across 144 live websites. The results are humbling: Claude Sonnet 4, the top performer among seven frontier models tested, completed only 33.3% of tasks successfully. Unlike controlled sandbox tests, ClawBench uses actual websites with login requirements, CAPTCHAs, and unpredictable interfaces. The gap between AI demo videos and practical reliability remains substantial.
Why it matters: If you're evaluating AI agents to automate routine web tasks for your team, this benchmark suggests the technology isn't ready for unsupervised deployment—expect significant human oversight for the foreseeable future.

Source: arxiv.org

What's Happening on Capitol Hill
Upcoming AI-related committee hearings

Tuesday, April 14

Business meeting to consider S.1682, to direct the Consumer Product Safety Commission to promulgate a consumer product safety standard for certain gates, S.1885, to require the Federal Trade Commission, with the concurrence of the Secretary of Health and Human Services acting through the Surgeon General, to implement a mental health warning label on covered platforms, S.1962, to amend the Secure and Trusted Communications Networks Act of 2019 to prohibit the Federal Communications Commission from granting a license or United States market access for a geostationary orbit satellite system or a nongeostationary orbit satellite system, or an authorization to use an individually licensed earth station or a blanket-licensed earth station, if the license, grant of market access, or authorization would be held or controlled by an entity that produces or provides any covered communications equipment or service or an affiliate of such an entity, S.2378, to amend title 49, United States Code, to establish funds for investments in aviation security checkpoint technology, S.3257, to require the Administrator of the Federal Aviation Administration to revise regulations for certain individuals carrying out aviation activities who disclose a mental health diagnosis or condition, S.3404, to require a report on Federal support to the cybersecurity of commercial satellite systems, S.3597, to reauthorize the National Quantum Initiative Act, S.3618, to require the Federal Trade Commission to submit to Congress a report on the ability of minors to access fentanyl through social media platforms, S.3791, to reauthorize Regional Ocean Partnerships, and routine lists in the Coast Guard.

Senate · Senate Commerce, Science, and Transportation (Meeting)
253, Russell Senate Office Building

Wednesday, April 15

Building an AI-Ready America: Understanding AI’s Economic Impact on Workers and Employers

House · House Education and Workforce Subcommittee on Workforce Protections (Hearing)
2175, Rayburn House Office Building

Wednesday, April 15

Hearings to examine S.465, to require the Federal Energy Regulatory Commission to reform the interconnection queue process for the prioritization and approval of certain projects, S.1327, to require the Federal Energy Regulatory Commission to establish a shared savings incentive to return a portion of the savings attributable to an investment in grid-enhancing technology to the developer of that grid-enhancing technology, S.3034, to amend the Federal Power Act to require the Federal Energy Regulatory Commission to review regulations that may affect the reliable operation of the bulk-power system, S.3192, to require Transmission Organizations to allow aggregators of retail customers to submit to organized wholesale electric markets bids that aggregate demand flexibility of customers of certain utilities, S.3269, to direct the Comptroller General of the United States to conduct a technology assessment focused on liquid cooling systems for artificial intelligence compute clusters and high-performance computing facilities, S.3947, to amend the Federal Power Act to establish a categorical exclusion for reconductoring within existing rights-of-way.

Senate · Senate Energy and Natural Resources Subcommittee on Energy (Meeting)
366, Dirksen Senate Office Building

Thursday, April 16

Hearing:
China’s Campaign to Steal America’s AI Edge

House · Unknown Committee (Hearing)
390, Cannon House Office Building

What's On The Pod
Some new podcast episodes

The Cognitive Revolution —
                Calm AI for Crazy Days: Inside Granola's Design Philosophy, with co-founder Sam Stephenson

How I AI —
                I built a custom Slack inbox. It was easier than you’d think. | Yash Tekriwal (Clay)

Reply to this email with feedback.
Unsubscribe

                                Don't miss what's next. Subscribe to The Daily AI Digest:

                        What topics interest you most? 

            Email address (required)

                    ← Newer

                D.A.D.: Molotov Cocktail Thrown At Home of OpenAI CEO — 4/11

                    Older →

                D.A.D.: Analysis Finds Some AI Models Share Similar Writing Styles Despite Price Gaps — 4/9

Tuesday, April 14	Business meeting to consider S.1682, to direct the Consumer Product Safety Commission to promulgate a consumer product safety standard for certain gates, S.1885, to require the Federal Trade Commission, with the concurrence of the Secretary of Health and Human Services acting through the Surgeon General, to implement a mental health warning label on covered platforms, S.1962, to amend the Secure and Trusted Communications Networks Act of 2019 to prohibit the Federal Communications Commission from granting a license or United States market access for a geostationary orbit satellite system or a nongeostationary orbit satellite system, or an authorization to use an individually licensed earth station or a blanket-licensed earth station, if the license, grant of market access, or authorization would be held or controlled by an entity that produces or provides any covered communications equipment or service or an affiliate of such an entity, S.2378, to amend title 49, United States Code, to establish funds for investments in aviation security checkpoint technology, S.3257, to require the Administrator of the Federal Aviation Administration to revise regulations for certain individuals carrying out aviation activities who disclose a mental health diagnosis or condition, S.3404, to require a report on Federal support to the cybersecurity of commercial satellite systems, S.3597, to reauthorize the National Quantum Initiative Act, S.3618, to require the Federal Trade Commission to submit to Congress a report on the ability of minors to access fentanyl through social media platforms, S.3791, to reauthorize Regional Ocean Partnerships, and routine lists in the Coast Guard. Senate · Senate Commerce, Science, and Transportation (Meeting) 253, Russell Senate Office Building
Wednesday, April 15	Building an AI-Ready America: Understanding AI’s Economic Impact on Workers and Employers House · House Education and Workforce Subcommittee on Workforce Protections (Hearing) 2175, Rayburn House Office Building
Wednesday, April 15	Hearings to examine S.465, to require the Federal Energy Regulatory Commission to reform the interconnection queue process for the prioritization and approval of certain projects, S.1327, to require the Federal Energy Regulatory Commission to establish a shared savings incentive to return a portion of the savings attributable to an investment in grid-enhancing technology to the developer of that grid-enhancing technology, S.3034, to amend the Federal Power Act to require the Federal Energy Regulatory Commission to review regulations that may affect the reliable operation of the bulk-power system, S.3192, to require Transmission Organizations to allow aggregators of retail customers to submit to organized wholesale electric markets bids that aggregate demand flexibility of customers of certain utilities, S.3269, to direct the Comptroller General of the United States to conduct a technology assessment focused on liquid cooling systems for artificial intelligence compute clusters and high-performance computing facilities, S.3947, to amend the Federal Power Act to establish a categorical exclusion for reconductoring within existing rights-of-way. Senate · Senate Energy and Natural Resources Subcommittee on Energy (Meeting) 366, Dirksen Senate Office Building
Thursday, April 16	Hearing: China’s Campaign to Steal America’s AI Edge House · Unknown Committee (Hearing) 390, Cannon House Office Building