
Callista AI Weekly (March 01 - March 08, 2026)
This was a week of high-stakes moves. OpenAI dropped two models in three days. Mastercard and Santander proved that AI agents can handle real payments. And the Pentagon's standoff with Anthropic forced the industry to confront a question it has been avoiding: who gets to set the limits on military AI?
New AI Use Cases
Santander and Mastercard Complete Europe's First AI Agent Payment
Banco Santander and Mastercard completed Europe's first live end-to-end payment executed by an AI agent on March 2. The transaction used Mastercard Agent Pay within Santander's regulated banking infrastructure. An AI agent initiated, authenticated, and settled a payment on behalf of a customer - all within predefined limits and permissions.
The pilot ran on Santander's live payments infrastructure with support from PayOS, Microsoft Azure OpenAI Service, and Copilot Studio. It wasn't a demo. The bank processed the transaction through real rails to validate the operational and control framework under production conditions. Santander will now move into extended testing and scaling across additional use cases.
This follows earlier milestones in the same program: Commonwealth Bank of Australia completed its first Agent Pay transaction in January, and Westpac in New Zealand followed in February. The direction is clear - banks are building the pipes for agents to spend money on our behalf.
Netflix Acquires InterPositive
Netflix acquired InterPositive, the filmmaking software company founded by Ben Affleck, on March 5. InterPositive builds AI tools that let directors relight shots, adjust visual effects, and modify color grading in post-production - all without reshooting. The system works by training an AI model on a production's existing dailies, then making that model available during post. The entire 16-person team joins Netflix, with Affleck taking on a senior adviser role. Terms were not disclosed.
Luma Launches Creative AI Agents
Luma introduced Luma Agents on March 5, a platform that coordinates multiple AI systems to generate creative work across text, images, video, and audio. The agents are powered by Luma's Uni-1 model, the first in its Unified Intelligence family - a single multimodal reasoning architecture trained end to end.
What sets Luma Agents apart from the usual prompt-and-iterate workflow is its approach to variation. Instead of going back and forth on each image or concept, the system generates large sets of variations and lets users steer through conversation. It can also call external models including Luma's Ray 3.14, Google's Veo 3, ByteDance's Seedream, and ElevenLabs' voice models.
Luma Agents are already embedded in global agency operations. Launch partners include Publicis Groupe Middle East and Turkey, Serviceplan Group, Adidas, Mazda, and Saudi AI company Humain. The platform is publicly available via API, with broader access rolling out gradually.
Teramind Tackles Shadow AI in the Enterprise
Teramind launched its AI Governance platform on March 3, billing it as the first enterprise-grade behavioral oversight system for AI tools and autonomous agents. The product captures prompts, responses, and autonomous agent actions across ChatGPT, Microsoft Copilot, Google Gemini, Claude Code, and unapproved shadow AI tools.
The timing isn't accidental. Teramind's internal research found that more than 80% of workers now use unapproved AI tools on the job. A third have shared proprietary data with unsanctioned platforms. Nearly half actively hide their AI use from IT. AI-associated breaches now cost organizations more than $650,000 per incident.
The platform produces continuous audit trails across SOX, HIPAA, CMMC, FedRAMP, SOC 2, ISO 27001, and the EU AI Act. It requires no new infrastructure and works through behavioral detection by execution pattern rather than signature - meaning it can spot AI tools it has never seen before.
GitHub Copilot Goes Deeper on Agentic Coding
GitHub made several announcements on March 5 that push its Copilot coding agent further into autonomous territory:
The coding agent now integrates with Jira, entering public preview. You can assign a Jira issue to Copilot, and it generates a draft pull request in your GitHub repository.
Copilot code review now runs on a fully agentic architecture. The agent reviews its own changes before opening a pull request, iterating on feedback before tagging a human.
Custom agents let teams codify their engineering approach in configuration files under .github/agents/. A performance optimization agent, for example, can benchmark before and after a change, then open a pull request only if the results improve.
Built-in security scanning catches dependency vulnerabilities, committed API keys, and code scanning alerts inside the agent workflow - before the pull request is opened.
GPT-5.4, OpenAI's latest model, became available in GitHub Copilot within hours of its release, rolling out across Visual Studio Code, Visual Studio, JetBrains, Xcode, Eclipse, and the GitHub CLI.
Major Vendor Updates
OpenAI Ships GPT-5.3 Instant and GPT-5.4 in the Same Week
OpenAI released two models in three days. GPT-5.3 Instant launched on March 3 as the new default model in ChatGPT. GPT-5.4, the company's most capable frontier model, followed on March 5.
GPT-5.3 Instant is the conversational workhorse. It reduces hallucination rates by 26.8% when searching the web and 19.7% when relying on internal knowledge, compared to prior models. OpenAI also tuned the model's personality - fewer unnecessary refusals, less moralizing preamble, more direct answers. The update addresses widespread user complaints about GPT-5.2's tone, which many found preachy and evasive. GPT-5.2 Instant will remain available in the model picker for three months before its June 3 retirement.
GPT-5.4 is the bigger story. It arrives in three variants: standard, Thinking (the reasoning model), and Pro (optimized for high performance). Key numbers:
Feature GPT-5.4 Context window (API) 1 million tokens Error reduction vs. GPT-5.2 33% fewer individual claim errors Token efficiency Solves same problems with significantly fewer tokens GDPval (knowledge work) 83% (record) Computer use Native (first OpenAI general model)
GPT-5.4 is the first general-use OpenAI model with native computer-use capabilities - it can autonomously work across applications on a machine on behalf of the user. It also set records on computer use benchmarks OSWorld-Verified and WebArena Verified.
GPT-5.4 Thinking is available to Plus, Teams, and Pro users. GPT-5.4 Pro is available through the API and for ChatGPT Enterprise and Edu subscribers.
Microsoft - Phi-4-reasoning-vision-15B
Microsoft open-sourced Phi-4-reasoning-vision-15B on March 4, a compact multimodal model that punches well above its weight class. The 15-billion-parameter model processes both images and text, handling complex math and science problems, chart interpretation, document analysis, and GUI navigation.
The standout innovation is selective reasoning. About 20% of training samples included explicit chain-of-thought reasoning traces, while 80% were tagged for direct response. The result: the model knows when deep reasoning helps and when it just adds noise. Training took just four days on 240 Nvidia B200 GPUs using 200 billion multimodal tokens - a fraction of what competitors consume. Microsoft released the weights on HuggingFace and Azure AI Foundry.
Huawei - Atlas 950 SuperPod
Huawei unveiled the Atlas 950 SuperPod at MWC Barcelona (March 2-5), its direct challenge to Nvidia's AI data center dominance. Key specs:
64 NPUs per cabinet, scaling up to 8,192 NPUs
UnifiedBus interconnect with ultra-high bandwidth and ultra-low latency
Unified memory addressing - the entire cluster behaves as one machine
Supports large-scale AI training and high-concurrency inference
The UnifiedBus architecture means thousands of compute nodes operate as a single logical computer. Huawei won eight GLOMO Awards at the event.
DeepSeek V4 Imminent but Still Waiting
DeepSeek kept the AI world waiting. The Chinese lab's V4 model - a trillion-parameter multimodal system with native image and video generation, a 1-million-token context window, and optimization for Huawei Ascend and Cambricon chips - was widely expected during the first week of March. TechNode reported on March 2 that sources indicated the release was planned for "this week," timed to coincide with China's annual Two Sessions parliamentary meetings starting March 4.
As of March 6, V4 had not launched. The model has slipped past its original mid-February target, through the Lunar New Year window, and past multiple late-February deadlines. When it does arrive, DeepSeek plans to release it under an Apache 2.0 license, allowing commercial use, modification, and redistribution without royalties. Given its reported specifications - roughly 32 billion active parameters in a mixture-of-experts architecture - V4 could meaningfully shift the open-weight landscape.
Mastercard Open-Sources Verifiable Intent for Agentic Commerce
Mastercard announced Verifiable Intent on March 5, an open-source framework for securing AI agent transactions. The system links a consumer's identity, their specific instructions, and the outcome of a transaction into a single, tamper-resistant cryptographic record. If a dispute arises, all parties can consult this audit trail.
The framework uses Selective Disclosure, sharing only the minimum information each party needs to verify authorization or resolve a dispute. Mastercard is publishing the specification on GitHub and has secured commitments from Google, Fiserv, IBM, Checkout.com, Basis Theory, and Getnet. Google's VP and general manager of payments called it a natural accelerator for scaling agentic commerce and confirmed compatibility with Google's Agent Payments Protocol.
In the coming months, Verifiable Intent will be integrated into Mastercard Agent Pay's intent APIs. Combined with the Santander pilot, Mastercard is now operating at both the infrastructure and transaction layers of agentic payments - a position no other payments company currently holds.
AI Governance
The Anthropic-Pentagon Standoff
The biggest governance story of the week played out between Anthropic and the U.S. Department of Defense. The conflict centers on how Anthropic's Claude models can be used in military applications.
The Pentagon wanted unfettered access to Claude across all lawful purposes. Anthropic wanted assurances that its technology would not be tapped for fully autonomous weapons or domestic mass surveillance. The talks broke down when Defense Secretary Pete Hegseth designated Anthropic a "supply-chain risk to national security" - the first time the U.S. government has ever applied this designation to an American company. The designation means no contractor, supplier, or partner doing business with the military can deal with Anthropic.
OpenAI moved quickly to fill the gap. It accepted the Pentagon's "all lawful purposes" framework but layered on architectural controls: cloud-only deployment, a proprietary safety stack the Pentagon agreed not to override, and cleared engineers embedded in forward operations. The deal highlights a fundamental tension - OpenAI and Anthropic, founded by some of the same people, are now on opposite sides of the most consequential AI procurement decision in U.S. history.
By March 5, Anthropic CEO Dario Amodei had returned to the negotiating table in what multiple outlets described as a last-ditch effort to reach an agreement. The outcome remains uncertain, but the precedent is set: governments are willing to use coercive economic tools against AI companies that resist open-ended military access.
U.S. Federal AI Deadlines Converge
Several deadlines from President Trump's December 2025 Executive Order on AI governance converge in March:
March 11: The FTC must publish a policy statement explaining how its prohibition on unfair and deceptive acts or practices applies to AI models. The statement will also address whether state laws requiring changes to "truthful" AI outputs are preempted by federal law.
March 16: The FCC Chairman and Special Advisor for AI are to explore a federal reporting or disclosure standard for AI models that would preempt state requirements.
March 16: The Commerce Department must identify "onerous" state AI laws and refer them to the AI Litigation Task Force for potential legal challenges.
The executive order established an AI litigation task force empowered to challenge state AI laws on grounds of unconstitutional regulation of interstate commerce and federal preemption. This sets up a direct collision between the federal government's pro-industry approach and state-level efforts to regulate AI. California's Transparency in Frontier Artificial Intelligence Act and Texas's Responsible Artificial Intelligence Governance Act, both effective since January 1, could face federal challenges.
Agentic AI Creates a Governance Gap
Teramind's launch this week underscored a growing problem. As AI agents move from demos to production, governance hasn't kept pace. Nearly 70% of enterprises already run AI agents in production, according to industry surveys, and another 23% plan deployments this year. But the tools agents use, the data they access, and the actions they take remain largely invisible to compliance and security teams.
Gartner estimates that 40% of enterprise applications will have AI agents embedded by year-end, up from 5% in September 2025. The gap between adoption speed and governance maturity is widening.
Breakthrough Research
Research headlines this week were overshadowed by product launches, but two directions are worth noting.
PsychAdapter, a system published in early March, demonstrated that large language models can be fine-tuned to reflect specific personality traits and mental health conditions with up to 98.7% accuracy in matching intended personality levels. The researchers tested across GPT-2, LLaMA-3, and Gemma models. Applications range from mental health research to personalized AI assistants - but the dual-use implications are significant. A system that can convincingly mimic psychological profiles could also be used for manipulation and social engineering at scale.
Separately, researchers published work on a neural network architecture that bridges sensory experience and symbolic thought - connecting direct sensory input with abstract conceptual reasoning in a single system. The work represents progress toward AI systems that can move between perceiving the world and reasoning about it abstractly, similar to how humans process information. The practical implications are still distant, but this is the kind of foundational work that tends to show up in products five to ten years later.
The larger research trend remains consistent: the field is shifting from raw capability gains to reliability, efficiency, and practical deployment. OpenAI's GPT-5.3 Instant reducing hallucinations by nearly 27% is, in some ways, more significant than any benchmark record. Users don't need smarter models - they need models they can trust.
Selective Reasoning in Multimodal Models
The most significant research development this week came from Microsoft's work on Phi-4-reasoning-vision-15B, which doubles as both a product release and a research contribution. The core finding - that chain-of-thought reasoning actively degrades performance on many visual tasks like captioning and OCR - challenges a widely held assumption in the field. The hybrid training approach, mixing 20% reasoning traces with 80% direct responses, offers a practical template for building models that reason selectively rather than reflexively.
Instant LLM Customization
Sakana AI released Doc-to-LoRA and Text-to-LoRA in late February, with the research gaining significant traction this week. These hypernetworks generate LoRA adapters in a single forward pass, enabling instant LLM customization without running a new fine-tuning job. The numbers are striking:
Near-perfect accuracy on instances 5x longer than a base model's context window
Memory cut from 12GB to 50MB
Sub-second latency for both methods
Hybrid Architectures Gain Ground
AI2's Olmo Hybrid contributes to a growing body of evidence that hybrid architectures - mixing transformer attention with linear recurrent layers - can dramatically improve training efficiency. The 2x data efficiency gain over a pure transformer baseline suggests that the dominant architecture may be ripe for fundamental rethinking.
MCP Becomes an Industry Standard
The Anthropic Model Context Protocol (MCP) continued its march toward becoming an industry standard. The protocol, now housed under the Linux Foundation's Agentic AI Foundation alongside OpenAI's AGENTS.md and Block's goose, has surpassed 10,000 published MCP servers. The foundation is co-founded by Anthropic, Block, and OpenAI, with support from Google, Microsoft, AWS, Cloudflare, and Bloomberg. MCP's standardization matters because it defines how AI agents connect to tools, data, and applications - the plumbing that makes agentic AI work in practice.
Physics-Informed Machine Learning
Separately, a University of Hawaii research team published a paper in AIP Advances advancing "physics-informed machine learning" - an approach that constrains AI to obey physical laws while processing complex datasets. The technique improves prediction accuracy in fluid dynamics and climate modeling, with implications for engineering, meteorology, and renewable energy planning. The work represents an important bridge between data-driven AI and scientific rigor.
Conclusion
Three things defined this week.
First, agentic AI crossed a critical infrastructure threshold. Santander and Mastercard didn't just demonstrate that an AI agent can make a payment. They proved it can happen within a regulated banking framework, on live rails, with real money. Mastercard's open-sourcing of Verifiable Intent then gave the broader ecosystem a standard to build on. The payments industry, which moves slowly for good reasons, is now moving fast on agents.
Second, OpenAI released GPT-5.4 with native computer-use capabilities, making it the first general-purpose model from the company that can autonomously operate across applications. Combined with GPT-5.3 Instant's focus on reducing hallucinations and improving conversational quality, OpenAI is clearly running a two-track strategy: make the everyday model more trustworthy while pushing the frontier model into autonomous territory.
Third, the Anthropic-Pentagon standoff exposed a fault line that won't close easily. The U.S. government demonstrated it will use economic coercion - including the unprecedented "supply-chain risk" designation - against AI companies that set limits on military use. Anthropic's return to negotiations by week's end suggests the pressure is working. For the rest of the industry, the message is hard to miss: building safety guardrails is one thing, but maintaining them against a determined government customer is something else entirely.
Ready to explore how Agentic AI can transform your organization? Visit us athttps://www.callista.ch/agentic-ai to discover how we can guide your journey into this exciting new era of AI-powered productivity.
Sources
Mastercard Newsroom - "Santander and Mastercard complete Europe's first live end-to-end payment executed by an AI agent" - March 2, 2026
TechCrunch - "Luma launches creative AI agents powered by its new 'Unified Intelligence' models" - March 5, 2026
SiliconANGLE - "Teramind launches agentic AI visibility and policy platform for AI tools" - March 3, 2026
GitHub Blog - "GitHub Copilot coding agent for Jira is now in public preview" - March 5, 2026
GitHub Blog - "GPT-5.4 is generally available in GitHub Copilot" - March 5, 2026
GitHub Blog - "Copilot code review now runs on an agentic architecture" - March 5, 2026
OpenAI - "GPT-5.3 Instant: Smoother, more useful everyday conversations" - March 3, 2026
OpenAI - "Introducing GPT-5.4" - March 5, 2026
TechCrunch - "OpenAI launches GPT-5.4 with Pro and Thinking versions" - March 5, 2026
Fortune - "OpenAI launches GPT-5.4, its most powerful model for enterprise work" - March 5, 2026
TechNode - "DeepSeek plans V4 multimodal model release this week, sources say" - March 2, 2026
Mastercard - "How Verifiable Intent builds trust in agentic AI commerce" - March 5, 2026
PYMNTS - "Mastercard Unveils Open Standard to Verify AI Agent Transactions" - March 5, 2026
Fortune - "The Anthropic-OpenAI feud and their Pentagon dispute expose a deeper problem with AI safety" - March 5, 2026
MIT Technology Review - "OpenAI's 'compromise' with the Pentagon is what Anthropic feared" - March 2, 2026
Axios - "OpenAI-Pentagon deal faces same safety concerns that plagued Anthropic talks" - March 1, 2026
CNBC - "Anthropic and the Pentagon are back at the negotiating table, FT reports" - March 5, 2026
Bloomberg - "Anthropic Reopens Talks with Pentagon After AI Safety Feud" - March 5, 2026
Baker Botts - "March 2026: Federal Deadlines That Will Reshape the AI Regulatory Landscape" - March 2026
9to5Mac - "OpenAI releases GPT-5.3 Instant update to make ChatGPT less 'cringe'" - March 3, 2026
