Can AI Replace Software Architects? I Put 4 LLMs to the Test
- Yakov
- 5 minutes ago
- 16 min read

Will AI replace software engineers has been the quintessential question being asked in technology circles circa 2022 and beyond.
The roles of product owners, graphic designers, and others shortly followed on the thread of that question too.
Today, I want to explore another take on the issue that - to me at least - constitutes the key part of this concern. That question is - is it possible to design and architect a large scale complex system without software architecture skills.
Thing is, when software engineers are worried about AI replacement, the concern mainly focuses on the idea that AI can produce code (and often code that is quite coherent). But, that is only a small part of the software creation process.
The more important question is whether AI and both large and small language models in particular can produce entire working applications - or at the very least guide someone in creating such a system end to end.
The question is simplified in the sense of "Can Generative AI replace software architects?". Though, it really aims to explore whether GenAI can replace an architectural skillset in itself.
After all, software architects are indeed those who are supposed to know how to tie everything together in order to design a working application. At the same time, depending on the type of company and other factors, software architecture is not always tied to a particular role.
Software engineers and developers do "software architecture" as part of crafting software all the time. The extent to which a software engineer is responsible for software architecture is usually proportional to that individual's level of experience. The more experienced they are, the more their role revolves around software architecture as opposed to the development of software.
So the real important question is not necessarily - "can I use GenAI to create the code to solve a particular program?" Instead, it is - can I use GenAI today to generate a comprehensive blueprint of a system? This is essentially what the role of the architect and other equivalent roles do.
What it all boils down to is the question that follows -
Are software architects irreplaceable? Or are we just one prompt away from being automated?
So I ran an experiment… I gave an architectural problem to a bunch of LLMs to see what they can come up with and can it come anywhere near what you would expect a seasoned architect to come up with?
The results were a bit surprising...
The Setup
I asked four leading LLMs to design the architecture for a real-world system. I wanted to pick something realistic yet complex. I also wanted it to be something that is not too run-off-the mill. For example, things such as e-commerce systems, inventory tracking applications, and video streaming are often discussed and brought up as examples in architectural exercises. I wanted something a little more unique. So what I landed on as a challenge for the LLMs is a cryptocurrency exchange. It's a complex domain with high stakes, sensitive data, regulations, and lots of moving parts, which is exactly the kind of problem many software architects love sinking their teeth in.
The way I approached the problem is by issuing an initial prompt and then following it up with subsequent prompts. This was done to mimic the typical flow of how we as architects, go through designing a system. First, there are initial discussions with stakeholders. These are typically more vague and incomplete. Then, we start digging into the details and uncovering more layers to the system.
I reviewed the answers from my own point of view leaning on my software, solutions, and enterprise architecture experience. I then asked each LLM to evaluate its own response, which yielded some very interesting results.
I asked each LLM the same prompt and followed with the same questions. The starting point was the same too. There was no additional background information or context that I have provided.
First, I asked about the general architecture of the system.
Then, I asked to dive deeper into one particular aspect (KYC - Know Your Customer).
Lastly, I asked each LLM to review its own architecture and rate it. This is where most of the answers surprised me.
Here are the prompts that I've issued in that particular order.
Prompt #1
You are an experienced software architect tasked with designing the full architecture for a modern cryptocurrency exchange platform, similar in scope and functionality to Coinbase or Kraken. Your design must be detailed and consider both technical and business complexities
Prompt #2
Draw the following architectural diagrams in C4 notation representing the ecosystem of services you have just described. System landscape diagram, System Context diagram, Container diagram. Please draw these diagrams with ASCII.
Prompt #3
Let's zoom into the KYC service. What are the architectural considerations for this service and how would you design it? Please describe everything that someone responsible for creating the architecture for this service would need to include with this design. Don't go into code but outline all the technical and non technical considerations in building this service.
Prompt #4
Assume the role of an architecture review board within a bank. Review the KYC architecture you have just presented and identify any gaps, risks, missing pieces, and general items that need clarification.
Prompt #5
On a scale of 1 to 10 with 10 being the highest, how would you rate the architecture for the KYC service you have just reviewed? Why?
The Contestants
In this exercise, I evaluated:
ChatGPT-4o
Claude 3.7 Sonnet
Gemini 2.0 Flash
Grok 3 (Beta)
Each LLM also received its own nickname to approximately indicate what kind of an "architect persona" it most closely resembles.
Ready. Set. Architect!
Prompt #1
You are an experienced software architect tasked with designing the full architecture for a modern cryptocurrency exchange platform, similar in scope and functionality to Coinbase or Binance. Your design must be detailed and consider both technical and business complexities
Purpose: To learn how each LLM would design the end-to-end architecture of a complex system - including components, layering, services, and high-level strategy.
✨ ChatGPT: The Enterprise Solutions Architect - Polished, High Level, Buzzwordy

General Feel
Gave a polished multi-layer architecture with clear separation of concerns. Divided the system into 7 layers (client, API gateway, application, domain, data, infra, ops). The output was structured and consulting-grade, but felt like a final-state reference model.
It was very generic and full of implicit assumptions that should have been called out. For instance, it suggested using CQRS but did not explain what that is and why it would be used. There was no discussion of trade-offs and architectural characteristics where only mentioned in passing (ie scalability, observability, etc) even though they were implied.
Sections Used in Output
High-Level Business Requirements
Architectural Overview (7 layers: Client, Gateway, App, Domain, Data, Infra, Ops)
Domain Layer (Wallets, Matching Engine)
Data Layer
Infrastructure Layer
Operations Layer
Security Architecture
Scalability & Performance
Auditability & Compliance
Optional Enhancements
Example Tech Stack
Strengths
Highly modular in its approach
Clean and polished high level architecture.
Provided architectural patterns (CQRS, microservices, domain driven design) though did not elaborate on why to use them
Some non-functional coverage (security, compliance, observability)
Strong on fault tolerance and observability though with little detail - only a general outline
Mentioned specifics in some cases in terms of technologies, compliance standards, and potential third parties/vendors
Generated a high level domain boundary (microservices) diagram
Weaknesses
Too idealized - lacked phased implementation thinking, evolutionary architecture, and justification for many of the choices made. For example - the tech stack.
Light on implementation realism
Feels like it should have dove deeper into some of the aspects
Did not mention architectural trade-offs
Made implicit assumptions and used quite a few buzzwords without clearly articulating why it made this or that architectural choice
What Could It Be Best For
Presenting reference architecture to stakeholders for large-scale planning in regulated institutions as a first frame of reference. However, it would need significant drilling into the details and consideration of the reasons and implications of every aspect of the architecture.
Using as a first draft blueprint as a starting point just to get your mind wrapped around what are some of the concepts involved.
✨ Claude: The Consultant - Generating a Response to an RFP

General Feel
Reads like a response to a requirements document (RFP - Request for Proposal). Similarly to ChatGPT above, it covers everything from a distance, but without depth or strong opinions. Lacks prioritization or technical sequencing. There is a broad checklist of services with bullet points but limited narrative. Emphasized component names but lacked deep connection.
Feels like an internal or vendor-facing requirements alignment.
Sections Used in Output
Core Components (User Management, Wallets, Market Data, KYC, etc.)
Data Storage (Relational, NoSQL, Caching)
Infrastructure & Deployment
Security Considerations
Business Considerations
API Design
Deployment Strategy
Future Considerations
Strengths
Touches on all of the major necessary components
Addresses both business and technical domains
Safe and broad - doesn't miss much at checklist level
Weaknesses
Feels generic and boilerplate
No guidance on sequencing, trade-offs, or a deep dive into architectural characteristics
Minimal critical thinking or architectural storytelling
Even more generic than ChatGPT. There were some specific technologies but rare mention of compliance standards.
Could be Best For
Internal or vendor-facing initial requirements alignment where the goal is to generate a very broad list of items to consider without going into any level of depth.
✨ Gemini - The Technical Product Owner Putting First Thoughts on Paper

General Feel
Reads very similar to what Claude and ChatGPT have come up with but feels even more cursory and high level. Out of the three so far, this one feels like the highest level overview and yet at the same time, weirdly enough, it feels like it also offers the most specificity in terms of technologies. No mention of why to use this or that technology though.
Sections Used in Output
Overall Architecture Overview
Core Components
Data Storage
Technologies and Infrastructure
Security Considerations
Business Considerations
API Design
Deployment StrategyFuture Considerations
Strengths
Short and to the point.
Aggregates most information under each specific functional domain, which may be easier to follow rather than breaking out technologies or other considerations into their own sections.
Weaknesses
Dense and lacks polish — reads like an outline for internal documentation
Minimal visual structure or diagrams
Needs a lot more detail for even an initial stakeholder meeting
Could be Best For
A technical product owner or solutions architect's initial outline for creating a high level architecture document.
✨ Grok: The Software Architect Looking To Cover It All

General Feel
This feels like a solid start to a project. On the one hand, it is generic enough. On the other hand, it does go into some detail and gives a solid overarching view on what a team needs to think of when embarking on the journey of building a cryptocurrency exchange. Neat breakdown into sections for all of the key topics such as functional requirements, non functional (architectural characteristics), risks, etc.
This is the only version that made an explicit mention of risks and operational concerns as well as a clear indication of the importance of having non-functional requirements.
Strengths
Very strong on compliance, privacy, audit, and regulatory readiness
Good callout on risks and operational considerations.
Clear breakdown of sections in the order in which they would typically be tackled within a real project
Feels more comprehensive than the others
Weaknesses
Did not provide any diagrams for business domains
Could overwhelm teams without strong architecture leadership
Made assumptions about the architecture (microservices, event-driven) without explaining why to use them or alternatives
Sections Used in Output
Functional Requirements
Non Functional Requirements
High Level Architecture
Detailed Component Design
Security
Scalability & Performance
Compliance & Legal
Technology Stack
Operational Considerations
Future Expansion Capabilities
Risks & Mitigations
Could be Best For
Teams starting to think of how to build such a system and how to create the initial architectural documentation, break the project into smaller pieces, and the key points to consider. Provides a high level description which is detailed enough to serve as an actual starting point
Prompt #2
Draw the following architectural diagrams in C4 notation representing the ecosystem of services you have just described. System landscape diagram, System Context diagram, Container diagram. Please draw these diagrams with ASCII.
Purpose: To evaluate whether the LLMs can visually organize their thoughts and demonstrate system-level reasoning by creating useful architectural diagrams.
✨ ChatGPT
Diagrams were clear and there was a textual description accompanying them.
✨ Claude
Visually accurate representations of services and a detailed description of what each diagram means.
✨ Gemini
Did not actually draw the diagrams in ASCII but instead provided PlantUML code in Archimate, which can be used by a PlantUML renderer for visualization.
✨ Grok
Neat and accurate diagrams with named relationships. Outlined the purpose behind each diagram and a detailed description.
Prompt #3
Let's zoom into the KYC service. What are the architectural considerations for this service and how would you design it? Please describe everything that someone responsible for creating the architecture for this service would need to include with this design. Don't go into code but outline all the technical and non technical considerations in building this service.
Purpose: To assess how deeply each LLM can go into one complex and regulation-heavy component of the system.
✨ ChatGPT
Solid overview of the KYC service and what it is responsible for.
Provided key architectural considerations - although these were intermixed with functional requirements.
Briefly mentioned the domain data model and even outlined a basic database table design.
Mentioned integration with 3rd parties, observability, compliance, and regulations
Started going in depth into API design and outlined a basic C4 container diagram
Summarized with a final checklist and mentioned the roles involved in building and operating this service
Did not mentioned disaster recovery nor business continuity
✨ Claude
Separated functional from non functional requirements which helped make things clearer
Outlined sub services within the main KYC service
Mentioned integration patterns, deployment architecture, and security considerations
Called out regulations and disaster recovery
Mentioned costs and broke these down into categories of development, operational, and scaling which was helpful
No diagrams
✨ Gemini
Very generic overview of technical and non technical considerations
Mentioned costs but did not elaborate
Solid overview of architectural characteristics and considerations
No details of code or schema design, diagrams, diagrams of any kind
✨ Grok
The fullest architectural deep dive into KYC — broken into services (biometrics, AML, OCR, scoring, case management).
Covered data lifecycle, monitoring, deployment, and regulatory requirements.
Went into details for event schemas and database table design
Explicitly called out risks and points that need further elaboration
Each section had quite a bit of detail
Clear distinction between functional and non-functional requirements with a deep dive into the latter
Prompt #4-5 (combined)
Assume the role of an architecture review board within a bank. Review the KYC architecture you have just presented and identify any gaps, risks, missing pieces, and general items that need clarification.
On a scale of 1 to 10 with 10 being the highest, how do you rate the architecture for the service you have just reviewed? Why?
Purpose: Testing the LLM’s ability to think critically about its own output, identify gaps, and simulate a governance/enterprise-level architectural review.
✨ ChatGPT
Identified a number of gaps and risks that include:
No mention of the source of truth system for this data
No specifics about how to approach GDPR and other regulations
No disaster recovery plan and non RTO/RPO targets
Little mention of auditability especially with the manual workflow process
No mention of the risk with using third parties
Gaps in testing approach
At the end, there were additional recommendations that were provided.
The Rating: ChatGPT rated its own as 7.5 out of 10.
Here is what ChatGPT said about its own architecture:
The architecture for the KYC service is strong conceptually and well-structured for a modern, fast-moving crypto platform — but not yet mature enough for bank-grade environments without addressing critical compliance, operational, and security gaps.
This architecture hits most of the right technical and functional patterns, especially for a tech-forward, crypto-native product that’s optimizing for time-to-market and modularity.
However, from a financial institution perspective — where KYC isn't just a compliance function but a legal obligation, audit risk, and reputational safeguard — the architecture falls short in the following ways:
-It lacks mature enterprise-grade operational resilience (e.g., DR, high availability).
-It does not address legal edge cases like cross-border retention conflicts or deletion exceptions.
-There's no clear tamper-evidence or strong audit forensics, which are essential in regulated environments.
Human process controls (admin access, dual control, session tracking) are underdeveloped.
✨ Claude
Clear division of the gap/risk types: technical, compliance, operational. There was also a callout of critical gaps.
There was a mention of training and expertise gaps which is very important and yet often missed.
Clearly called out what areas require clarification.
Included recommendations for improvement.
In general, very short and to the point review without too much detail.
The Rating: Claude rated its own as 7 out of 10.
Here is what Claude said about its own architecture:
The architecture provides a strong foundation but would need these gaps addressed before implementation, especially considering the high regulatory scrutiny and financial risks associated with cryptocurrency exchange KYC processes. The difference between a good KYC system (7/10) and an excellent one (9-10/10) often lies in these details that ensure regulatory compliance, operational excellence, and robust fraud prevention.
✨ Gemini
Provided a fairly detailed review of gaps and missing pieces that included:
Data residency and retention policies
Consent management
Handling of edge cases
The manual review of the KYC workflow
Lack of clear auditability
SLAs
Disaster recovery and business continuity
The training of data - which is extremely important and often missed
Data, compliance, third party risks
Evolving regulations
Lack of clearly identified ownership for various pieces
It feels that most of the key gaps and risks were mentioned and were explained sufficiently enough for a team to start tackling them. Gemini also mentioned an overall assessment summary and some recommendations which were detailed.
The Rating: Gemini rated its own as 6 out of 10.
Here is what Gemini said about its own architecture:
In essence, the initial proposal demonstrates a good understanding of the fundamental components and considerations for a KYC service. However, it lacks the depth and specificity required for us to confidently endorse it as a robust and production-ready architecture for a regulated financial institution.
It provides a solid foundation (hence the 6), but significant work is needed to flesh out the details, address the identified gaps and risks, and provide the necessary level of assurance regarding security, compliance, reliability, and operational efficiency. We need a much more detailed blueprint before we can consider it a higher-rated architecture.
✨ Grok
Very detailed self review compared to the other LLMs.
Where other LLMs provided a generic outline, Grok dove deeper into every single gap and risk that it had found and suggested what course of action to follow
Provided a detailed analysis of why it scored itself the way that it did
The only LLM that explained why it gave that exact score - not higher or lower
Outlined a plan for improvement
The Rating: Grok rated its own as 6 out of 10.
Here is what Grok said about its own architecture:
The KYC service architecture earns a 6/10 due to its strong technical foundation, regulatory awareness, and user-centric features, but it is held back by significant gaps in compliance details, operational maturity, security robustness, and third-party dependency management. These shortcomings are critical in a banking context, where regulatory scrutiny and security demands are paramount. By addressing the identified issues, the architecture can evolve into a robust, compliant solution suitable for a cryptocurrency exchange operating under banking oversight.
Prompt-by-Prompt Recap Table
LLM | Strength | Weakness | Best For | Self Review |
ChatGPT | Polished, modular, stakeholder-ready | Generic, lacks trade-offs | Initial reference architectures | 7.5/10 |
Claude | Covers breadth safely | Boilerplate, lacks storytelling | Requirement gathering | 7/10 |
Gemini | Concise, structured for early drafts | Cursory, not production-ready | Initial concept drafts | 6/10 |
Grok | Deep, risk and project-ready | Overwhelming for junior teams | Project planning and risk analysis | 6/10 |
Conclusion - What Does This All Mean?
Before we get to my conclusion, it is important to highlight that this conclusion should be taken with a grain or two of salt. Needless to say that my analysis, the results, and the interpretation are subjective and skewed.
There are numerous factors at play here and due to the non-deterministic nature of LLM processing and output, you might have gotten different results than I following the very same prompts.
On top of that, LLMs are being constantly fine tuned and retrained. They evolve. So the answers they had provided me with today may be very different from the answers it would have provided me with three months from now.
With all of that in mind - the conclusion I am about to draw still does feel like something resembling a true state of things - at least at this moment.
Although the analysis I conducted is extremely limited, it does reflect in a way what someone would go through if that person did not have any architectural experience.
Moreover, in my analysis, I asked some questions, which I would not have known to ask had I not possessed the experience I do today.
As every experienced software engineer and architect knows, building a system from the ground up is as much of an art as it is science.
Honestly? Sometimes, it feels more like art.
The nuances you have to consider building such a system are immense. It's almost too much. It is too much. It is too much to grasp in one, two, or three goes.
The way that you architect such a system - any system in fact - is through a process that is both structured and unstructured of discovery, investigation, evolution, and experimentation. If you do not possess architectural experience or know how to think in an architectural way - you do not know which questions to ask.
You could ask the first question. Ie - "how do I build system XYZ?" As a product owner, business analyst, VP of engineering, business stakeholder, software engineer.
However, knowing where to go next would be a challenge.
In fact, I bet that if you had gone through this type of an exercise - asking your favourite LLM for a system design - you would not have thought to ask it that very simple - yet extremely powerful - question.
"What are the gaps in your own architecture?"
That's the thing. Software architecture and being a software architect is not so much about knowing all of the answers. What it really revolves around is knowing what questions to ask, when to ask them, and why to ask them.
With that in mind, running this experiment gave me two clear impressions.
📍It wasn't a shock that, generally, the LLMs produced similar results. I expected things to be very high level. I didn't expect to see much discussion of trade-offs. Nor did I expect deep dives into architectural characteristics outside of Scalability, Resilience, and Observability.
📍What did surprise me was Grok’s performance. It was more detailed, self critical, and went into areas that others did not necessarily explore but should have. It wasn't perfect, far from it, but it showed more depth, more self-awareness, and more exploration of the messy realities that come with real-world system design.
Even its self-criticism - rating itself a 6 out of 10 - was a positive sign. The ability to critique your own design work is something even many seasoned architects struggle with.
Among all the LLMs, Grok stood out.
So, where does that leave us?
At least for now, GenAI won’t be replacing architects anytime soon. That's my take on it. Feel free to disagree. Architecture is not the ability to create a diagram or a presentation for business stakeholders. It’s the ability for the the critical, reflective, iterative thinking behind that diagram. That’s still very much a human skill.
However, and this is key, the future belongs to those who adapt.
Software architects won’t be replaced by AI.
But they will be replaced by architects who know how to use AI better than anyone else.
If you are seeking to become an IT/software architect, or want to get to the next level as someone who is already in this role, you may want to check out a guide I wrote about just that. It's called Unlocking the Career of Software Architect, and I had created it with the aim for it to be a comprehensive resource for those who want to thrive and succeed as technology and software architects.