What 49 Researchers Are Trying to Tell Us About AI and Software Development

Abstract

In early 2025, 49 researchers from institutions across four continents published a manifesto warning that our current approach to AI-assisted software development is dangerous. Their argument: the combination of AI speed and agile urgency has created a "perfect storm" that threatens to eliminate deliberate thinking from the development process, with potentially catastrophic consequences.

This paper synthesises their academic framework for practitioners. It examines what the manifesto actually says, why it matters, and what organisations should do about it. Through case studies, practical frameworks, and honest assessment of industry failures, it makes the case that deliberation, not speed, is the scarce resource in the AI age.

Introduction: The Warning We're Ignoring

On July 19, 2024, CrowdStrike pushed a routine content update to its Falcon sensor software. Within hours, approximately 8.5 million Windows devices crashed. Airlines grounded flights. Hospitals cancelled surgeries. Banks went offline. Emergency services lost dispatch systems.

The cause? A faulty update that passed through automated validation but contained a logic error that human review would have caught. A single bad file, pushed at machine speed, scaled to machine catastrophe. Estimated damages: $5.4 billion.

CrowdStrike wasn't using AI for code generation. It was conventional software, conventional testing, conventional deployment. The failure came from something more fundamental: speed without deliberation.

Three months after the CrowdStrike disaster, 49 researchers from universities and institutions worldwide published a manifesto titled "Software Development and Modeling in the Age of Artificial Intelligence." They cite CrowdStrike explicitly as a warning of what happens when "aggressive release cycles" meet systems designed for speed over safety.

Their core argument: the same dynamics that enabled CrowdStrike's failure are being dramatically amplified by AI. We're building faster than ever, understanding less than ever, and the consequences are scaling exponentially.

This paper is my attempt to translate their academic framework into practical guidance. The researchers have done their part (documenting the risks, proposing alternatives, calling for deliberation). Now it's on practitioners to listen.

Part I: The Perfect Storm

The Collision of AI and Agile

The manifesto identifies a "perfect storm" threatening software development: the collision of increasingly powerful AI with increasingly aggressive agile practices.

Agile development has become the dominant methodology, prioritising speed, iteration, and rapid delivery over planning and documentation. This made sense as a corrective to waterfall's rigidity. But it's created what the researchers call "Agile Burnout". Aggressive release cycles strain developers and, at the societal level, lead to harm through failed systems.

AI accelerates this dynamic. Tasks that took days now take minutes. Code generation, debugging, refactoring, even deployment, all can be accomplished with a few natural language prompts. The manifesto notes:

"Tasks that used to take years of experience to master and days to complete, such as requirements specification, writing boilerplate code, debugging, refactoring, and even live deployment, can be done (at least in principle) within minutes with a few natural language prompts."

This sounds like progress. In many ways, it is. But speed without understanding creates a particular kind of danger.

Vibe Coding and Its Discontents

In February 2025, Andrej Karpathy coined the term "vibe coding", a natural language-driven development paradigm where developers describe desired functionality rather than writing code, often bypassing traditional review and debugging processes.

The term was playful. The practice is spreading rapidly. And the research on its outcomes is concerning.

The Databricks Red Team demonstrated how vibe coding led to a critical remote code execution vulnerability. The AI used Python's pickle module unsafely, a well-known attack vector that any experienced developer would catch. The vibe coder didn't catch it because they didn't understand what they'd built.

Separate studies have shown that AI-generated code samples frequently contain known vulnerabilities, including SQL injection, one of the oldest and most exploited attack vectors in existence.

"While vibe coding democratizes programming and accelerates prototyping, recent studies have raised concerns about its reliability and security... These findings indicate the need for practices that support abstraction, communication, durable documentation and developer education (hallmarks of conceptual modeling based approaches)."

In other words: speed has a cost. We're not paying it upfront. We'll pay it later, with interest.

The System 1 Problem

The manifesto frames the AI-human dynamic through Daniel Kahneman's System 1 and System 2 framework – a Nobel Prize-winning model of human cognition.

System 1 is fast, intuitive, pattern-matching. It's how you catch a ball or recognise a face. It works brilliantly when situations match learned patterns. It runs automatically and continuously; it "cannot be turned off."

System 2 is slow, deliberate, analytical. It's invoked in unfamiliar, complex, or unusual situations. It requires effort and attention. It's how we reason through novel problems.

AI, the manifesto argues, operates like System 1 at superhuman scale. It's extraordinarily good at pattern matching, generating code that looks right based on patterns in its training data. It's fast and confident. And it has no idea whether what it's producing is actually appropriate for your specific context.

"Generative AI, in particular, largely acts as a non-deterministic pattern-matching engine that is trained on past data to present new output based on existing patterns. However, it is generally not capable of considering the broader environment of the real world and what might actually be needed."

Vibe coding (and much of current AI-assisted development) is what happens when System 1 runs at scale without System 2 supervision. The manifesto warns:

"Accepting and proceeding with whatever code is suggested by AI amounts to letting System 1 take control. As a result, seemingly minor errors could lead to large financial losses and compliance violations. What is worse, if allowed to scale uncontrolled, AI-generated software has the capacity to cause great harm."

The capacity to cause great harm. That's 49 researchers choosing their words carefully.

Part II: The SAFE-AI Framework

Four Principles for Responsible Development

The manifesto proposes four principles they call SAFE-AI:

S – Strategic Deliberation over Speed and Scale

There are multiple ways to solve any given problem. AI may suggest several solutions, but not all are equally effective, beneficial, or safe. Strategic deliberation involves practicing slower System 2 thinking before leveraging AI's speed and scale.

The manifesto quotes Sun Tzu: "Strategy without tactics is the slowest route to victory. Tactics without strategy is the noise before defeat."

Speed is tactics. Deliberation is strategy. Without strategy, speed is just noise before failure.

A – Attention to Critical Requirements and Critical Impact

With AI handling many time-consuming tasks, human effort is freed to focus on what matters most. The manifesto distinguishes between general requirements (which AI can help gather) and "critical requirements", the things that absolutely must be right.

"By 'what really matters' we mean those critical requirements that the system absolutely must get right. These are typically the requirements that deal with health, safety, legal, and ethical implications of the system, along with essential business rules that must be correctly executed."

Humans should focus on the critical. AI can handle the rest.

F – Freedom to Develop Software Components

The manifesto rejects rigid prescriptions about which tasks belong to humans versus AI. Different developers have different strengths, preferences, and working styles.

"Humans should have the freedom to choose the development contribution. This underscores the need for AI systems to be ethical toward human developers, remaining aligned with their goals and mindful of associated risks."

Freedom also enables innovation. Without it, we risk stifling creativity and preserving the status quo.

E – Empowerment to Challenge and Improve Solutions

The best designs rarely emerge from a single iteration. Everyone involved (humans, AI, and broader stakeholders) should be empowered to challenge emerging solutions.

"For humans, empowerment means having the authority and capabilities to question assumptions, review alternatives, reinforce AI learning and influence on decisions. For AI, empowerment refers to having the capability to propose diverse solutions, identify potential risks, and provide evidence-based recommendations rather than just executing instructions."

This creates a feedback loop: AI suggests, humans evaluate, both improve.

The MADE Process

The manifesto proposes a process for implementing SAFE-AI principles, which they call MADE with AI:

M – Model to (Re)Activate System 2

Before diving into implementation, pause and model the problem. This doesn't mean extensive UML diagrams or formal specifications. It means externalising your thinking, on paper, whiteboard, napkin, or computer.

"The mere act of modeling forces one to pause and think. Drawing diagrams... requires translating human mental models and ideas into another medium."

The manifesto cites Eisenhower: "Plans are worthless, but planning is everything." The model may not be implemented as designed. The act of modeling ensures deliberation happens.

A – Agree on Critical Requirements and Critical Impact

Once System 2 is engaged, focus on what's critical. What must this system get right? What are the potential impacts (direct, indirect, delayed)? Who's affected beyond the obvious stakeholders?

"Humans no longer need to completely and accurately capture all the requirements. Rather, they need to determine what is critical to get right as the system is being built."

This is a fundamental shift from traditional requirements gathering. You're not documenting everything – you're identifying what matters.

D – Develop the Solution

Now AI and humans collaborate on building. The manifesto emphasises flexibility: AI can generate code from prompts, humans can write code directly, models can be converted into components automatically. Different approaches suit different situations and preferences.

The key is that development is guided by the modeling and agreement that preceded it, not proceeding blindly.

E – Evaluate Developed and Deployed Solutions

Finally, continuous evaluation – against the critical requirements, against the anticipated impacts, against the real-world behaviour of the deployed system.

"The efficiencies gained by working with AI permit continuous and comprehensive evaluation of the emerging solutions with broader stakeholders."

The cycle repeats: Model, Agree, Develop, Evaluate. Each iteration refines the solution.

Part III: The Sous-Chef Myth

A Comforting Metaphor

There's a popular metaphor circulating in AI development circles: AI is like a sous-chef, and you're the executive chef.

The sous-chef handles prep work (chopping vegetables, reducing sauces, executing techniques). But the executive chef designs the menu, tastes for quality, and takes responsibility for what leaves the kitchen.

The manifesto references this metaphor:

"An emerging consensus treats AI as a coding hyper-assistant, a sort of sous-chef, following the lead of a human executive chef."

The metaphor is comforting. It suggests humans remain in control. AI does the grunt work; we do the thinking. The hierarchy is clear.

The metaphor is also dangerous – because it describes what should happen, not what is happening.

What's Actually Happening in the Kitchen

I've worked in professional kitchens. So has my wife – her in fine dining, me in high-volume catering during our undergraduate years. We know what kitchens look like when they're functioning. We know the hierarchy, the accountability, the pace, the pressure.

The sous-chef metaphor isn't just wrong. It's an insult to actual kitchens.

Here's what I see in most organisations using AI for development:

The "executive chef" prompts the AI: "Build me a customer portal."

The AI generates code. The executive chef glances at it. Maybe runs it to see if it works. Ships it.

That's not an executive chef directing a sous-chef. That's a restaurant owner who wandered into the kitchen, asked "what's for dinner?" and served whatever came out of the oven.

The sous-chef didn't just prep ingredients. The sous-chef decided what to cook, chose the ingredients, determined the techniques, plated the dish, and the "executive chef" just approved it without understanding it.

The Tasting Problem

Here's the crux: an executive chef tastes everything.

I remember watching the head chef at the catering company I worked for. Every sauce, every protein, every component – he'd taste it before it went out. Not occasionally. Every time. He'd adjust seasoning on the fly, reject dishes that weren't right, catch problems before they reached customers.

They can taste because they've spent years developing their palate. They understand flavour at a deep level. They know what good tastes like.

Most people using AI for development can't taste the code.

They can check if it runs. They can see if it produces expected output for expected input. But they can't tell if the architecture is sound, if the security is adequate, if the edge cases are handled, if the approach is elegant or a disaster waiting to happen.

You can't be an executive chef if you can't taste the food. You're just a customer who wandered into the kitchen.

The Accountability Vacuum

In a real kitchen, the hierarchy is clear. The executive chef answers for everything that leaves the kitchen. If a dish makes someone sick, the executive chef owns it. Not the sous-chef.

I saw this accountability play out in real time. When something went wrong – a dish sent back, a timing failure, an ingredient shortage – it rolled uphill, not downhill. The person at the top owned it, fixed it, and made sure it didn't happen again.

In AI-assisted development, we've inverted this.

When AI-generated code fails (when it has vulnerabilities, when it breaks in production, when it causes harm), who's responsible?

The developer says: "I didn't write that code. The AI did."
The organisation says: "We trusted our developer to review it."
The AI vendor says: "We're not responsible for how our tool is used."

Everyone points at everyone else. We're creating what I call an accountability vacuum: systems that no one understands, deployed by people who can't explain them, operated by organisations that disclaim responsibility.

The sous-chef metaphor obscures this vacuum. It makes it sound like someone's in charge. But the metaphor only works if the executive chef is actually doing their job, and most aren't.

Part IV: The Skills Inversion

What the Hard Part Actually Is Now

For fifty years, the bottleneck in software development was implementation. You had ideas; turning them into working code required years of training and specialised skills. The gap between "I want this" and "I have this" was bridged by people who could write code.

That gap just collapsed.

AI can now generate functional code from natural language descriptions. Not perfect code, but working code, often good enough code, in seconds rather than days.

This should be liberating. In many ways, it is. But it's also exposing an uncomfortable truth: for many people, implementation was the easy part. They just didn't know it because they'd never had to do what comes before.

"The paradigm that 'software development is to develop software' no longer holds because there is no software to develop (AI is doing that for us). However, what AI cannot do well right now is deep reasoning, domain understanding, and designing safe, sound solutions for everyone."

If AI handles implementation, what's left? Understanding. Reasoning. Design. Safety. Soundness.

These are hard problems. Harder than coding ever was. And they're now the core of software development.

The New Hard Problems

Let me be specific about what's actually difficult now:

Understanding domains deeply enough to know what matters. AI can generate code for a healthcare system. It cannot understand healthcare (the regulations, the workflows, the life-and-death stakes, the edge cases where someone dies if you get it wrong).
Identifying requirements that stakeholders can't articulate. The most important requirements are often invisible because they're so obvious to domain experts that they go unspoken. Finding them requires asking questions, challenging assumptions, and having enough domain knowledge to know what questions to ask.
Anticipating failure modes before they occur. AI generates code that handles the happy path beautifully. The sad paths (edge cases, adversarial inputs, unexpected interactions) often aren't in the training data. Anticipating them requires imagination and paranoia.
Making trade-off decisions with incomplete information. Every system involves trade-offs: speed vs. security, flexibility vs. simplicity, features vs. maintainability. AI can generate options. It can't weigh trade-offs in your specific context.
Considering impacts beyond immediate functionality. The manifesto emphasises "critical impact", effects on people, organisations, and environment that might be indirect or delayed. This is inherently System 2 work.

The Hiring Implications

Most technical hiring is still optimised for implementation skills. Coding tests. Algorithm challenges. Framework proficiency.

These tests measure the skills being commoditised while ignoring the skills becoming critical.

"Modeling should remain central in computing curricula as a means of cultivating reflective, ethical, and systematic thinking... Students should learn not only to create models but also to evaluate, critique, and refine AI-generated artifacts."

Evaluate, critique, and refine. Not just generate and ship.

The developers who will thrive are the ones who can do what AI can't: think deeply about problems, reason about consequences, and make sound judgments under uncertainty.

Organisations should hire accordingly. Test for problem understanding, not just solution generation. Test for failure imagination. Test for judgment under uncertainty. Test for the ability to evaluate AI-generated outputs, not just produce them.

Part V: The CrowdStrike Lesson

Anatomy of a System 1 Failure

Let's return to CrowdStrike, because it illustrates every principle the manifesto warns about.

CrowdStrike's Falcon sensor runs at the kernel level, the deepest layer of the operating system. When Falcon fails, the entire system fails. No graceful degradation. Just a blue screen.

On July 19, 2024, CrowdStrike pushed a "Rapid Response Content" update. These updates are designed to be fast – responding to emerging threats in real-time. Speed is the feature.

The update contained a logic error. The error caused Falcon to crash. Because Falcon runs at kernel level, crashing Falcon crashed Windows. Because the update pushed automatically to millions of endpoints, millions of machines crashed simultaneously.

The faulty update was live for approximately 78 minutes before CrowdStrike reverted it. But affected machines couldn't boot. They couldn't receive the fix. Each required manual intervention – a technician physically accessing the machine and deleting the bad file.

8.5 million machines. Manual intervention. One at a time.

What Deliberation Would Have Prevented

CrowdStrike's update pipeline was pure System 1. Automated testing. Automated validation. Automated deployment. Fast, efficient, scalable.

No human looked at the specific update. No one asked "what if this is wrong?" No one considered "what's the blast radius if this fails?"

What would System 2 oversight have looked like?

Pre-deployment human review. Someone looks at each update before deployment, not to catch every bug, but to ask: is this update appropriate? Is there anything unusual about it? This adds minutes. It creates an accountability point.
Staged rollout. Push to 0.1% of endpoints. Wait 15 minutes. Check for anomalies. Push to 1%. Wait. Only after multiple successful stages, push globally. This adds hours to full deployment. It guarantees that catastrophic failures are caught before they become catastrophic.
Blast radius limits. Kill switches that halt deployment if crash rates exceed thresholds. Rollback capabilities that don't require manual intervention. This requires upfront engineering investment. It pays off when things go wrong.
Pre-mortem thinking. Before any deployment, ask: "Imagine this fails catastrophically. What happened?" Work backward from imagined failure to identify risks.

None of these are revolutionary. They're basic System 2 practices. But they conflict with the imperative to move fast, so they got skipped.

The AI Amplification

Here's what makes CrowdStrike a preview rather than an anomaly: it wasn't even using AI for code generation.

The failure was a human-written logic error that escaped automated testing. It was pushed by a conventional deployment pipeline optimised for speed.

And it still caused $5.4 billion in damage.

What happens when we add AI to this equation?

AI generates code faster. It generates more code. It pushes updates more frequently. It operates with less human oversight. Every element that made CrowdStrike's failure possible – speed, scale, automation, global reach – gets amplified.

"These risks are now dramatically amplified due to the speed and scale of AI."

CrowdStrike is the preview. AI at scale without deliberation is the feature film. And unless we change course, the feature film is going to be worse.

Part VI: What This Means for Practitioners

For Individual Developers

If your entire value proposition is "I can write code," you're competing with AI. And AI is getting better every month while working for free.

If your value proposition is "I can understand complex problems, make sound decisions about solutions, and ensure what gets built is actually right," you're complementing AI. You're the human in the loop that makes AI-assisted development actually work.

Invest in domain understanding. The deeper you understand the problems you're solving, the more valuable you become. AI can generate code for any domain. It can't understand any domain.
Develop critical evaluation skills. Learn to read and assess AI-generated code. Not just "does it work" but "is this the right approach?" What are the trade-offs? What could go wrong?
Practice failure imagination. Before you ship anything, ask: "What could go wrong? How could this be misused? What are the edge cases?" This skill compounds over time.
Take responsibility. If you ship it, you own it. Don't hide behind "the AI generated it." You approved it. It's yours.

For Technical Leaders

Your hiring criteria are probably optimised for a world that no longer exists. Your job descriptions emphasise implementation skills (languages, frameworks, years of experience with specific tools) while underweighting thinking skills.

Rewrite your job descriptions. Emphasise problem decomposition, requirements reasoning, systems thinking, failure imagination. Yes, you still need people who can work with code. But that's table stakes, not differentiator.
Change how you interview. Test for problem understanding, not just solution generation. Give candidates ambiguous problems and see how they clarify them. Show them AI-generated code and see how they evaluate it.
Restructure your teams. When one developer with AI can implement what previously took five, you don't need five implementers. You might need two, plus a domain expert, a systems thinker, and someone focused on safety and ethics.

For Organisations

The manifesto calls for deliberation to be institutionalised, not left to individual discretion:

"Organizations should introduce conceptual model reviews alongside traditional code reviews to ensure deliberate reflection on system purpose, safety, and critical requirements."

Create deliberation checkpoints. Before any significant development, require a brief modeling exercise. What problem are we solving? What are the critical requirements? What could go wrong? This doesn't need to be heavyweight – an hour of structured thinking can prevent months of rework.
Implement staged deployments. No code goes to 100% of users without passing through smaller percentages first. Build kill switches. Design for containment.
Establish accountability structures. Someone must own every system. Not "the team" but a specific person who answers when things go wrong. Accountability without names is no accountability.
Invest in evaluation capability. You need people who can assess AI-generated outputs. If no one on your team can evaluate whether code is secure, maintainable, and appropriate, you're not ready to ship AI-generated code.

Conclusion: The Choice Before Us

The manifesto closes with a call to action:

"To build a better world with AI, we must fundamentally rethink the partnership between human modelers and AI... In the age of AI, modeling is not an agility obstacle, it is a safeguard."

This isn't anti-AI. These researchers aren't Luddites. They're saying: use AI, but think first. Let AI handle implementation after humans have done the hard work of understanding the problem, defining requirements, considering impacts.

The choice before us is straightforward:

Option A

Continue as we are. Move fast. Ship constantly. Let AI generate and humans approve without understanding. Celebrate speed. Punish deliberation. Wait for the disasters to compound.

Option B

Integrate deliberation into our processes. Slow down enough to think. Focus human attention on what actually matters – critical requirements, potential impacts, failure modes. Use AI as a force multiplier for understanding, not just implementation.

CrowdStrike was $5.4 billion and grounded air travel worldwide. That was conventional software, conventional deployment, conventional testing.

AI amplifies everything, including the failures.

The 49 researchers who wrote this manifesto have done their part. They've documented the risks. They've proposed frameworks. They've issued the warning.

Now it's on practitioners to decide: will we listen?

Appendix: The SAFE-AI Principles at a Glance

Principle	Summary	Key Question
Strategic Deliberation	Think before you leverage AI's speed	Have we understood the problem before generating solutions?
Attention to Critical Requirements	Focus human effort on what must be right	What are the safety, security, and ethical requirements?
Freedom to Develop	Let humans choose their contribution	Are we enabling creativity or prescribing rigidity?
Empowerment to Challenge	Enable everyone to improve the solution	Can stakeholders question and refine what's being built?

Appendix: The MADE Process at a Glance

Step	Action	Purpose
Model	Pause and externalise thinking	Activate System 2 cognition
Agree	Identify critical requirements and impacts	Focus on what matters
Develop	Build with AI, guided by models	Implement with understanding
Evaluate	Continuously assess against requirements	Ensure alignment with intent

The SAFE-AI Manifesto was authored by Lukyanenko, Samuel, Tegarden, Larsen, Jabbari, and 44 additional researchers from institutions including University of Virginia, University of Cincinnati, Virginia Tech, University of Colorado, Université Laval, and dozens of others across four continents.

Jason La Greca is the founder of Teachnology, an AI transformation consultancy helping organisations build capability rather than dependency. He believes AI should empower humans, not replace human judgment.

Document Information

Version: 1.0

Date: December 2025

Word Count: ~5,500

Reading Time: 25 minutes

Suggested Citation: La Greca, J. (2025). What 49 Researchers Are Trying to Tell Us About AI and Software Development: A Practitioner's Guide to the SAFE-AI Manifesto. Teachnology.