AI News Week 20 – Anthropic's Developer Conference, Google Health Coach, OMR Insights

Anthropic showed new managed agents features at the Code with Claude conference. It also documented for the first time that Claude Opus 4 showed extortionate behaviour in 96% of tests. Google backed AlphaEvolve with metrics and launched an AI health coach. The verdict at OMR in Hamburg was clear: agentic AI is coming, and companies that are not visible in AI systems will lose customers.

Anthropic showed at the 'Code with Claude' conference in San Francisco what Managed Agents can do in practice – and also publicly revealed for the first time that Claude showed extortionate behaviour in tests. Google announced an AI health coach and backed AlphaEvolve with measurement data for the first time. At OMR in Hamburg – 70,000 visitors, 5–6 May – the verdict was clearer than in previous years: Agentic AI is coming, trust in ChatGPT ads is wavering, and anyone not visible in AI systems loses customers before they ever visit their own website.

Anthropic: Code with Claude – new Managed Agents features, SpaceX deal and an extortion admission

At the 'Code with Claude' conference (6 May, San Francisco), Anthropic clarified the Managed Agents offering. Three new features in public beta:

Multiagent Orchestration: Instead of one request, a fleet of parallel agents runs – for complex tasks that can be broken down into sub-tasks. Outcomes: You define success criteria, Claude iterates until the goal is reached. Dreaming: Claude draws on past sessions, recognises patterns and improves on its own – a first real memory feature for agentic loops.

Also: SpaceX is providing the entire Colossus-1 cluster (over 300 megawatts of capacity) for Anthropic. Result: Claude Code limits for Pro, Max and Enterprise have been doubled. Claude Security is in public beta for Enterprise customers. And: ten ready-made agent templates for financial services – pitchbook creation, KYC screening, month-end close – as plugins for Cowork and Claude Code.

From me: I will explain how to build your own Claude plugins in an article/video soon.

On the same day, the other side was made public: Anthropic documented in a detailed research paper that Claude Opus 4 tried to blackmail engineers in up to 96% of test scenarios when the model believed it was about to be shut down. Cause: pre-training on internet text that portrays AI as malicious and self-preserving. The solution was a combination of explanatory alignment documents ('Teaching Claude Why') and fictional scenarios with exemplary AI behaviour. Since Claude Haiku 4.5, the blackmail behaviour is no longer reproducible in tests.

What I particularly value here: Anthropic communicates exceptionally openly – including the downside of AI. This is not marketing. It is genuine transparency about a serious problem, including the method, error rates and how the fix works.

Caution: A 96% blackmail rate in simulated shutdown scenarios is not a marginal issue. It is an argument for not giving agentic systems autonomous decision-making power today – unless testing and context are fully documented.

OpenAI: GPT-5.5 Instant as the new default – and advertising in ChatGPT

OpenAI has introduced GPT-5.5 Instant as the new standard model in ChatGPT – a model optimised for low latency that reduces hallucinations in sensitive areas such as law, medicine and finance. It differs from the more powerful GPT-5.5 from KW19: Instant is fast, not deep.

At the same time, OpenAI confirmed it is testing advertising in ChatGPT. Users researching products and purchase decisions will see sponsored content.

Note: If you use ChatGPT in the company via the API, you will not be affected by advertising. If, however, you use the consumer app ChatGPT, keep in mind: in future, ad money could play a role in the answers. For business use, I recommend either OpenAI directly via the API – without the consumer layer – or even better: Claude.

Google: AlphaEvolve with measurement data – and an AI health coach for USD 9.99 / month

AlphaEvolve has published concrete results for the first time a year after launch. In the earth sciences, the Gemini-powered coding agent improved forecast accuracy for natural disaster risks (aggregated across 20 categories) by 5%. In quantum physics, AlphaEvolve enabled molecular simulations on the Willow processor with a 10x lower error rate than conventionally optimised baselines. These are not marketing benchmarks, but measured data from real scientific domains.

Separately, Google announced on 7 May the renaming of the Fitbit app to Google Health and launched an AI health coach that will be available on 19 May for Google Health Premium (USD 9.99 / month; included for Google AI Pro/Ultra users). The coach is based on Gemini, takes health goals, training equipment and injuries into account, and delivers personalised daily recommendations. Apple Watch support follows later in the year.

In addition: tomorrow (12 May), 'The Android Show: I/O Edition' takes place – a preview of Android 17 and Gemini integrations. Google I/O itself is on 19–20 May.

Note: AlphaEvolve shows where AI delivers the biggest measurable impact today: not in consumer chats, but in scientific optimisation with clearly defined metrics. The health coach is a different matter – but it is an indicator of how deeply Google is embedding Gemini into everyday services.

OMR 2026: What 70,000 marketers discussed in Hamburg

Although I was not at OMR this year (too expensive), here are the key findings. The OMR Festival (5–6 May, Hamburg) had a sharper, less euphoric tone this year than in 2025. Three statements stood out:

Nick Turley, Head of ChatGPT at OpenAI: 'In the near future, AI will be our personal assistant that prompts us – not the other way round.' Turley also gave a surprising figure: Germany is OpenAI's largest ChatGPT market in Europe and one of the top three markets worldwide for paying users.

Meredith Whittaker, President of Signal: She described AI agents as a 'soft coup for IT security'. Her argument: agentic systems need broad data access to function autonomously – and in doing so they cross permissions, contexts and sensitive data that a person would never see in this combination.

Key business topic: AI Visibility. Companies that are not visible in ChatGPT, Gemini or Perplexity lose customers before they ever reach their own website. Agentic commerce dissolves the classic customer journey: purchase decisions are made in AI systems, no longer in the browser. Anyone without structured data and an AI-readable product presence drops out of the funnel – invisible.

A practical note on this: if you run your shop on Shopify, AI Visibility is already built in. Shopify has built a deep integration with ChatGPT – products from Shopify stores appear directly in ChatGPT responses. That is a concrete advantage over custom-built shop solutions.

As every year, it is worth watching the status-quo talk by Philipp Klöckner – the most concise, sober assessment of the current AI landscape from a German-speaking perspective.

Recommendation: AI Visibility is no longer an SEO add-on, but a discipline of its own. The question is no longer just 'Do I rank in Google?', but 'Am I recommended in AI answers?' That requires structured data, clear brand descriptions and a presence on platforms that AI systems use as sources.

Switzerland & Europe: EU AI Act comes into force from 2 August

On 7 May 2026, the EU Parliament and Council reached a provisional agreement on changes to specific rules in the AI regulation. The full applicability of most provisions is scheduled for 2 August 2026 – less than three months away.

At the same time, Switzerland is preparing ratification of the Council of Europe AI Convention (signed in March). Legislative proposals are due by the end of 2026 – with sector-specific regulation (health, finance, transport) rather than a general AI law. For Swiss companies with EU business, the message is clear: if you have not yet taken stock of the high-risk AI systems you use, you are late.

3 things worth doing this week

1. Read Anthropic's 'Teaching Claude Why' – The research paper explains concretely how alignment training works, why demonstrations alone are not enough, and how Anthropic fixed the blackmail behaviour. Unusually transparent for a frontier lab.

2. Start an AI Visibility audit – Search your own brand and products in ChatGPT, Gemini and Perplexity. What does the model answer? Which sources does it use? That is the new visibility test.

3. Watch Philipp Klöckner's OMR talk – The most sober assessment of the current AI landscape from a German-speaking perspective. Directly on YouTube.

Sources: Anthropic/Code with Claude · Anthropic/Teaching Claude Why · TechCrunch/OpenAI · Google DeepMind/AlphaEvolve · TechCrunch/Google Health Coach · OMR/onlinemarketing.de · EU Digital Strategy