SpecKit creates the illusion of work, generating a bunch of text #1784

NaikSoftware · 2025-09-08T11:45:16Z

NaikSoftware
Sep 8, 2025

Even after providing complete and detailed information, the result is as follows:

Files are created ignoring the project structure, because the focus is not on the project, but on thousands of lines of instructions
Even instead of simply adding a test to the existing structure, it tries to create new files and generates hundreds of unnecessary tests, most of which make no sense at all
Overengineering, because the LLM, instead of executing a concise command, is busy analyzing kilobytes of text and generating new texts, completely losing the essence of the work

Conclusion: The area of use for SpecKit is very limited - it's for testing new ideas, generating simple prototypes. For proper incremental work, you need to:

Conduct an analysis of the project and task; here, ordinary simple queries to an LLM are very helpful
The developer themselves should think through the steps for implementation; LLM can be used for clarification, but under no circumstances should planning be handed over, because LLM != AI, it cannot know what exactly is the purpose of the task. It's still a large and complex, but neural network.
Atomic tasks can be given to the LLM, but controlling all changes and directing them in the right direction.

Based on the described points, SpecKit only complicates the work. The project needs to change its concept - it's not about software development, but more about generating ideas and a general plan. It's more a tool for a product owner than for a developer.

P.S used with Claude Code (Opus 4.1)

scotteveritt · 2025-09-08T14:50:58Z

scotteveritt
Sep 8, 2025

I believe if you have a well defined constitution and take the time to invest in refining the plan stage then it could be well worth the effort. I'm realizing that for smaller changes, speckit is overkill. But, if you'd like to make significant changes to a codebase then it's a useful tool. I do agree that if you are not careful during the plan phase it will ignore most of the current codebase structure.

For claude usage, i recommend creating a plan with Claude first, refining it, then before approving claudes plan, simply deny the plan and proceed to only type "/specify". This gives you a good starting point plan. Well, better than specifying one yourself.

0 replies

kevlingo · 2025-09-08T18:33:24Z

kevlingo
Sep 8, 2025

I started my development journey back in the VB4 days (yikes, that does date me!). Over the years I’ve worn a lot of hats: from being an independent developer to overseeing larger projects with contracted teams. That mix has given me some perspective that I think relates directly to your points.

Like you, I’ve seen countless tools and philosophies come and go, all promising productivity boosts. AI tools are the latest rabbit hole, but they require a different mindset: we need to approach them as if we’re contracting work out. If we don’t provide clear direction, we’ll get results that miss the mark.

This ties back to your critique of Spec-Kit:

Overengineering and messy project structures → I agree this happens, but I see it as a symptom of poor context management rather than the tool itself. The real “magic sauce” is filtering only the relevant documentation into the AI’s context so it can deliver focused, atomic results.
Hundreds of unnecessary tests → I think that’s what happens when we throw the whole kitchen sink at an LLM. Tight scope and incremental context keep it on the rails.
Atomic tasks under human direction → Completely agree. LLMs shine when they’re given specific, bounded jobs. Documentation + context engineering is how we make that possible.
Better for product owners than developers → I’d reframe this: to get value from AI, developers themselves increasingly have to think like product owners, project managers, and architects. That’s a shift, but also an opportunity.

So, while I share your skepticism about Spec-Kit as a drop-in tool for incremental dev, I think the real opportunity lies in how we manage context and documentation. Done well, it can bridge the gap you’re pointing out between “toy prototype generator” and “practical assistant.”

PS I used ME to write this 😄 (I know you were talking about the model you use when you code, but I couldn't help it!)

0 replies

NaikSoftware · 2025-09-08T18:45:28Z

NaikSoftware
Sep 8, 2025
Author

I believe if you have a well defined constitution and take the time to invest in refining the plan stage then it could be well worth the effort. I'm realizing that for smaller changes, speckit is overkill. But, if you'd like to make significant changes to a codebase then it's a useful tool. I do agree that if you are not careful during the plan phase it will ignore most of the current codebase structure.

For claude usage, i recommend creating a plan with Claude first, refining it, then before approving claudes plan, simply deny the plan and proceed to only type "/specify". This gives you a good starting point plan. Well, better than specifying one yourself.

Problem not in description, not in planning. I have well documented task (new feature). Described all needed things: constants and their values, places where I need the changes and so on. Its a real task with some increment for the user. But instead of just investigating codebase and creating comprehensive tasks with needed changes it's working about 30 min and created bunch of unneeded and don't described stuff. Some interfaces I am not asked, hundreds of tests and other things that I should fix on the implementation stage. My point - I can work much faster using LLM just as helper with queries like "create such a class with described functionality", finish this small part of work, fix some code if needed and move on. What place of SpecKit in this flow? Is it really needed if I should spend hours to fix plan, tasks and then fix implementation anyway?

0 replies

42tg · 2025-09-09T07:23:01Z

42tg
Sep 9, 2025

I also tried it out yesterday and find it somewhat dissapointing. And I can agree on the general Intuition what @NaikSoftware is already describing. It feels that Claude (wich I used) looses relative fast the overall Picture after starting an Implementation and after nearly an workday of keeping him running all the Time I had to invest alot of time to Pinch it back to create an usable experience.It keeps falling into the TDD trap of concentrating on the test cases and starts iterating on those issues very fast and after a short time it looses the initial todo very fast. Especially on Greenfield Projects (wich my was) It looses to fast track of an certain task, because the initial spec covers the whole application. I have to try this out again on an Brownfield Implementation to experience how this works.

I'm not sure if its related to my incapability to use this framework correctly, wich I must confess can also be true. But to work properly I think there are missing guardrails for Claude to Focus on the certain aspect. Also it keeps failing to update the Tasklist with the work it has done and if there is an new Session Started the Agent is not aware enough of the overall way to work. I think it would be good if the setup script would at least ship with an dedicated claude agent who keeps his context window focused to the task itself. This will of course work only with Claude but in my current point of View it must be more focused on what to do instead of tackling all those Test cases.

But overall thanks for the Idea I would say you are on the right track but we need more Guardrails here wich can be surely clarified trough precise Priming and Context management. 👍

0 replies

chrisleyva · 2025-09-09T16:51:37Z

chrisleyva
Sep 9, 2025

In order to save my Claude Code tokens (I pay for the Pro plan), I use it exclusively for coding and have been using Gemini CLI (2.5-pro) for all my planning/spec work.

I've found that, as awful as Gemini (both Pro & Flash models) has been with coding tasks (often creating code that is incomplete and riddled w/ compile-time errors), it has been an incredibly comprehensive and useful tool during spec work. Once I get it to a point where I'm ready to code, I then use Claude Code to start implementing based on all the specs created with Google. It's been a good workflow up to now.

Yesterday, I used Gemini (gemini-2.5-pro) to spec out a new service-based project (no-code whatsoever) and was blown away by how comprehensive the results were. I first ran specKit without editing my constitution.md file and noticed sub-par results. Then I edited the file and re-ran the /specify command and the difference was significant.

So far, I've had a great experience with this tool and I plan to use it again for another project tomorrow (no pun intended...or was it? 🙂).

0 replies

kenny-kim2 · 2025-09-10T00:34:27Z

kenny-kim2
Sep 10, 2025

(I'm not a native English speaker, so please excuse any parts that seem like they were put through a translator.)
It might be an issue with the models I used, but they often incorrectly implemented the contents of the tasks.md file. (Claude Code, Sonnet 4) For example, when adding a simple button to a screen, the button text would be different from what was written in the tasks.md file. While I agree with the philosophy of spec-kit and SDD, I feel that the current performance of the models isn't quite sufficient to entrust them with the entirety of the implementation.

0 replies

NaikSoftware · 2025-09-10T00:47:01Z

NaikSoftware
Sep 10, 2025
Author

(I'm not a native English speaker, so please excuse any parts that seem like they were put through a translator.) It might be an issue with the models I used, but they often incorrectly implemented the contents of the tasks.md file. (Claude Code, Sonnet 4) For example, when adding a simple button to a screen, the button text would be different from what was written in the tasks.md file. While I agree with the philosophy of spec-kit and SDD, I feel that the current performance of the models isn't quite sufficient to entrust them with the entirety of the implementation.

If you asked raw ChatGPT or Claude to do same thing it will work. But when you put kilobytes of text into context it can randomly fail. More input data = more unpredictable results, in my experience. I added detailed instructions into CLAUDE.md about my project (several months ago, it's wasn't one-day test) and it added certain randomness into daily tasks. Sometimes some instructions takes into account, sometimes no. Then I reduced CLAUDE.md by three times and surprise! It greatly improved stability of work.

0 replies

hendrix04 · 2025-09-11T13:55:07Z

hendrix04
Sep 11, 2025

@NaikSoftware,

Absolutely! The old adage, less is more really holds true when dealing with LLMs.

0 replies

amondnet · 2025-09-12T07:22:44Z

amondnet
Sep 12, 2025

@NaikSoftware
Regarding the criticism "SpecKit creates the illusion of work" — I believe this stems from a misunderstanding of spec-driven development (SDD) and agentic coding.

Over the past few years, I’ve successfully delivered 20+ AI-assisted projects, evolving from “vibe coding” toward primarily agentic coding and SDD. Key insights from this journey:

SpecKit isn’t a brand-new concept — Similar approaches have existed for months. I was already building an internal version before SpecKit’s release.
Not “illusion of work,” but “redefinition of work” — The balance has shifted from ~80% coding to 50% planning, 20% coding, 30% validation. In an era where AI can generate code rapidly, the developer’s value is increasingly in architectural decisions and clear specifications.
Context optimization is essential — As projects grow, feeding full context to AI becomes inefficient. Sub-agent patterns with domain-focused context reduced response time in my projects from 45s → 8s and cost from $0.50 → $0.08 per interaction.
Real results matter — After SpecKit’s release, I resolved real issues in 5 projects in a single day. This was possible because I was already fluent in SDD.

I’d encourage critics to try the difference between vibe coding and agentic coding first-hand. SpecKit is not just another text generator—it’s a framework for systematic development in the AI era. Personally, I’ve been very satisfied with its impact and continue to extend our internal tools based on it.

Change always meets resistance, but what’s clear is that the developer role is evolving: from primarily writing code to orchestrating AI, designing specifications, and ensuring quality. SpecKit reflects that shift.

0 replies

amondnet · 2025-09-12T08:32:59Z

amondnet
Sep 12, 2025

For those interested in diving deeper into these concepts:

And the spec-kit approach is quite similar to the best practices of Claude Code:

https://youtu.be/iF9iV4xponk?si=iSXV3ifbU9DkGjua&t=1165
- Easy tasks: Tasks that can be handled almost perfectly with a single prompt
  → Tag Claude in a GitHub issue and request a PR
- Medium tasks: Tasks requiring planning
  → Start in planning mode (Shift+Tab) in the terminal
- Difficult Tasks: Complex tasks requiring engineer-led execution
  Use Claude as a tool:
  - Research and analyze the codebase
  - Prototype ideas
  - Write code to understand system boundaries
  - Delegate unit test creation
https://www.anthropic.com/engineering/claude-code-best-practices
Claude Code doesn’t impose a specific workflow, giving you the flexibility to use it how you want. Within the space this flexibility affords, several successful patterns for effectively using Claude Code have emerged across our community of users:
a. Explore, plan, code, commit
This versatile workflow suits many problems:
1. Ask Claude to read relevant files, images, or URLs, providing either general pointers ("read the file that handles logging") or specific filenames ("read logging.py"), but explicitly tell it not to write any code just yet.
  1. This is the part of the workflow where you should consider strong use of subagents, especially for complex problems. Telling Claude to use subagents to verify details or investigate particular questions it might have, especially early on in a conversation or task, tends to preserve context availability without much downside in terms of lost efficiency.
2. Ask Claude to make a plan for how to approach a specific problem. We recommend using the word "think" to trigger extended thinking mode, which gives Claude additional computation time to evaluate alternatives more thoroughly. These specific phrases are mapped directly to increasing levels of thinking budget in the system: "think" < "think hard" < "think harder" < "ultrathink." Each level allocates progressively more thinking budget for Claude to use.
  1. If the results of this step seem reasonable, you can have Claude create a document or a GitHub issue with its plan so that you can reset to this spot if the implementation (step 3) isn’t what you want.
3. Ask Claude to implement its solution in code. This is also a good place to ask it to explicitly verify the reasonableness of its solution as it implements pieces of the solution.
4. Ask Claude to commit the result and create a pull request. If relevant, this is also a good time to have Claude update any READMEs or changelogs with an explanation of what it just did.
Steps 1-2 are crucial—without them, Claude tends to jump straight to coding a solution. While sometimes that's what you want, asking Claude to research and plan first significantly improves performance for problems requiring deeper thinking upfront.

b. Write tests, commit; code, iterate, commit
This is an Anthropic-favorite workflow for changes that are easily verifiable with unit, integration, or end-to-end tests. Test-driven development (TDD) becomes even more powerful with agentic coding:
1. Ask Claude to write tests based on expected input/output pairs. Be explicit about the fact that you’re doing test-driven development so that it avoids creating mock implementations, even for functionality that doesn’t exist yet in the codebase.
2. Tell Claude to run the tests and confirm they fail. Explicitly telling it not to write any implementation code at this stage is often helpful.
3. Ask Claude to commit the tests when you’re satisfied with them.
4. Ask Claude to write code that passes the tests, instructing it not to modify the tests. Tell Claude to keep going until all tests pass. It will usually take a few iterations for Claude to write code, run the tests, adjust the code, and run the tests again.
  1. At this stage, it can help to ask it to verify with independent subagents that the implementation isn’t overfitting to the tests
5. Ask Claude to commit the code once you’re satisfied with the changes.
Claude performs best when it has a clear target to iterate against—a visual mock, a test case, or another kind of output. By providing expected outputs like tests, Claude can make changes, evaluate results, and incrementally improve until it succeeds.

0 replies

NaikSoftware · 2025-09-12T12:18:18Z

NaikSoftware
Sep 12, 2025
Author

@NaikSoftware Regarding the criticism "SpecKit creates the illusion of work" — I believe this stems from a misunderstanding of spec-driven development (SDD) and agentic coding.

Over the past few years, I’ve successfully delivered 20+ AI-assisted projects, evolving from “vibe coding” toward primarily agentic coding and SDD. Key insights from this journey:

SpecKit isn’t a brand-new concept — Similar approaches have existed for months. I was already building an internal version before SpecKit’s release.

Not “illusion of work,” but “redefinition of work” — The balance has shifted from ~80% coding to 50% planning, 20% coding, 30% validation. In an era where AI can generate code rapidly, the developer’s value is increasingly in architectural decisions and clear specifications.

Context optimization is essential — As projects grow, feeding full context to AI becomes inefficient. Sub-agent patterns with domain-focused context reduced response time in my projects from 45s → 8s and cost from $0.50 → $0.08 per interaction.

Real results matter — After SpecKit’s release, I resolved real issues in 5 projects in a single day. This was possible because I was already fluent in SDD.

I’d encourage critics to try the difference between vibe coding and agentic coding first-hand. SpecKit is not just another text generator—it’s a framework for systematic development in the AI era. Personally, I’ve been very satisfied with its impact and continue to extend our internal tools based on it.

Change always meets resistance, but what’s clear is that the developer role is evolving: from primarily writing code to orchestrating AI, designing specifications, and ensuring quality. SpecKit reflects that shift.

Thank for the sharing you experience. I am just expressing how it works for me. Maybe depends from the project stack/complexity. But on real system with different modules (Flutter, JS, Java, ObjC and communication layer) SpecKit is almost impossible to use in a way that brings benefit. I tried Kiro IDE and BMAD - the same feeling. I spent much more time to implement something. Stuck in hundreds of specs which makes no sense because LLM can randomly ignore some of them.

Simple example: we have a task part of which is this: we should introduce new parameter in json that looks like "param": {"value": 1234}. There are examples in the codebase, I described additionally in the task description. But final results - LLM decided to use "param": 1234 instead of my requirements. I was small part of the whole task so I just approved changes, because described clearly what I need.

As a result a lot of time was spent for writing specs and a lot of time for debugging. Tests can't help because LLM written the test that just pass.

So when you see good looking generated MD files with formatting and emojis it is just text. And LLM operates as text generator in this case 🥲 This creates Illusion of work being done.

0 replies

hendrix04 · 2025-09-12T12:29:54Z

hendrix04
Sep 12, 2025

Wow, this is a great thread with a lot of great information in it.

@NaikSoftware,
I started writing this before your edit, so I will keep my original message but also respond to your example.

As with any new skill, it takes practice and experiecne to get it right. It will take awhile for developers to learn the appropriate skills to interact with AI and AI will evolve and get better over time.

Per your example, this is where experience with managing context comes into play. While you will likely get mitakes like that, the better that you can craft the context, the fewer of those typers of mistakes will pop up. Like with any jr developer, there will be mistakes and sr developers have to be there to catch them.

0 replies

NaikSoftware · 2025-09-12T12:35:51Z

NaikSoftware
Sep 12, 2025
Author

@hendrix04 I expect that SpecKit is designed to manage context. But it turns out that you need to know some magic spells so that LLM does not throw away context and does not make mistakes

0 replies

NaikSoftware · 2025-09-12T12:38:49Z

NaikSoftware
Sep 12, 2025
Author

Isn't it simpler to just think through simple steps and give them to the LLM directly, without overengineering?

0 replies

hendrix04 · 2025-09-12T15:31:03Z

hendrix04
Sep 12, 2025

@NaikSoftware,

SpecKit is designed to HELP manage context, but it isn't something that manages context on its own.

In your example above of the LLM doing the wrong format on your object, my guess is that is a case of too much context. There is a very fine balance between too much information and not enough information. The English language is not very precise, and you can see this by getting wildly different output just from changing one or two words in a prompt.

It is true though; you do indeed need to learn "magic spells" so to speak to make the LLM do what you want it to do. Like everything else in the world, LLMs are tools that require training and experience to leverage them effectively.

0 replies

NaikSoftware · 2025-09-12T19:16:15Z

NaikSoftware
Sep 12, 2025
Author

@NaikSoftware,

SpecKit is designed to HELP manage context, but it isn't something that manages context on its own.

In your example above of the LLM doing the wrong format on your object, my guess is that is a case of too much context. There is a very fine balance between too much information and not enough information. The English language is not very precise, and you can see this by getting wildly different output just from changing one or two words in a prompt.

It is true though; you do indeed need to learn "magic spells" so to speak to make the LLM do what you want it to do. Like everything else in the world, LLMs are tools that require training and experience to leverage them effectively.

Are you sure that generating 20-30 Kb of some stuff like detailed rephrase of your request with mix of some interfaces and another "water" can be called as "help managing context"? I trying again to use this tool and it's so hard to call "help" 😔

After hours of working on a not very complex feature, I decided to simply run a regular prompt in Claude Code describing the problem. Even CC understood that something was wrong with the SpecKit result. It removed all the unnecessary stuff (some innovative architectural solution with refactoring half of the project, which ultimately was absolutely unsuitable) and made a few truly necessary changes. Everyone who writes that SDD really works, do you actually get working results? Or are you just looking at 30 nicely formatted tasks and thinking 'Well, this is cool'.

0 replies

thukabjj · 2025-09-15T12:06:35Z

thukabjj
Sep 15, 2025

For those interested in diving deeper into these concepts:

https://smcleod.net/2025/06/vibe-coding-vs-agentic-coding/

https://blog.val.town/vibe-code

https://aroussi.com/post/spec-driven-ai-development

https://aroussi.com/post/ccpm-claude-code-project-management

https://github.com/automazeio/ccpm/blob/main/AGENTS.md

https://www.aiagentshub.net/blog/claude-code-subagents-researchers-not-coders

https://www.nathanonn.com/how-a-read-only-sub-agent-saved-my-context-window-and-fixed-my-wordpress-theme/

https://yozm.wishket.com/magazine/detail/3309/

And the spec-kit approach is quite similar to the best practices of Claude Code:

https://youtu.be/iF9iV4xponk?si=iSXV3ifbU9DkGjua&t=1165

Easy tasks: Tasks that can be handled almost perfectly with a single prompt
→ Tag Claude in a GitHub issue and request a PR

Medium tasks: Tasks requiring planning
→ Start in planning mode (Shift+Tab) in the terminal

Difficult Tasks: Complex tasks requiring engineer-led execution
Use Claude as a tool:

Research and analyze the codebase

Prototype ideas

Write code to understand system boundaries

Delegate unit test creation

https://www.anthropic.com/engineering/claude-code-best-practices
Claude Code doesn’t impose a specific workflow, giving you the flexibility to use it how you want. Within the space this flexibility affords, several successful patterns for effectively using Claude Code have emerged across our community of users:
a. Explore, plan, code, commit
This versatile workflow suits many problems:

Ask Claude to read relevant files, images, or URLs, providing either general pointers ("read the file that handles logging") or specific filenames ("read logging.py"), but explicitly tell it not to write any code just yet.

This is the part of the workflow where you should consider strong use of subagents, especially for complex problems. Telling Claude to use subagents to verify details or investigate particular questions it might have, especially early on in a conversation or task, tends to preserve context availability without much downside in terms of lost efficiency.

Ask Claude to make a plan for how to approach a specific problem. We recommend using the word "think" to trigger extended thinking mode, which gives Claude additional computation time to evaluate alternatives more thoroughly. These specific phrases are mapped directly to increasing levels of thinking budget in the system: "think" < "think hard" < "think harder" < "ultrathink." Each level allocates progressively more thinking budget for Claude to use.

If the results of this step seem reasonable, you can have Claude create a document or a GitHub issue with its plan so that you can reset to this spot if the implementation (step 3) isn’t what you want.

Ask Claude to implement its solution in code. This is also a good place to ask it to explicitly verify the reasonableness of its solution as it implements pieces of the solution.

Ask Claude to commit the result and create a pull request. If relevant, this is also a good time to have Claude update any READMEs or changelogs with an explanation of what it just did.

Steps 1-2 are crucial—without them, Claude tends to jump straight to coding a solution. While sometimes that's what you want, asking Claude to research and plan first significantly improves performance for problems requiring deeper thinking upfront.
b. Write tests, commit; code, iterate, commit
This is an Anthropic-favorite workflow for changes that are easily verifiable with unit, integration, or end-to-end tests. Test-driven development (TDD) becomes even more powerful with agentic coding:

Ask Claude to write tests based on expected input/output pairs. Be explicit about the fact that you’re doing test-driven development so that it avoids creating mock implementations, even for functionality that doesn’t exist yet in the codebase.

Tell Claude to run the tests and confirm they fail. Explicitly telling it not to write any implementation code at this stage is often helpful.

Ask Claude to commit the tests when you’re satisfied with them.

Ask Claude to write code that passes the tests, instructing it not to modify the tests. Tell Claude to keep going until all tests pass. It will usually take a few iterations for Claude to write code, run the tests, adjust the code, and run the tests again.

At this stage, it can help to ask it to verify with independent subagents that the implementation isn’t overfitting to the tests

Ask Claude to commit the code once you’re satisfied with the changes.

Claude performs best when it has a clear target to iterate against—a visual mock, a test case, or another kind of output. By providing expected outputs like tests, Claude can make changes, evaluate results, and incrementally improve until it succeeds.

I agree with your explanation, I've been testing Kiro who was the first that implemented it in the right way. IMO, this project will enhance the way of creating an solid documentation and for the implementation and context control it will be necessary adapt for each specific senario using and intermediarie step to check in the current context spreaded into several subagents

0 replies

amondnet · 2025-09-16T02:14:08Z

amondnet
Sep 16, 2025

@NaikSoftware I think the key is choosing the right tool for each task. Spec-Kit is powerful, but not every problem needs it.

For the simple fixes you mentioned, I don't even open my IDE:

I just use claude-code-action and say, @claude, fix this, and that's it.
Copilot, Gemini, and Claude automatically review the pull requests.
Then I add one line: @claude, can you modify the code based on the code review comments?. Repeat review and fix 2-3 times, then I do the final review. Usually takes 1-2 minutes.

Example: Simple bug reports or minor fixes are automatically implemented, reviewed, and deployed. The beauty is that you can process dozens of these per day on autopilot, freeing you up to focus on the complex tasks that actually need your attention.

For more complex tasks requiring architectural decisions, multiple iterations, or stakeholder alignment - that's where Spec-Kit and proper spec-driven development become essential.

Simple rule: 2-3 sentence task? AI automation. Multi-faceted feature with complex requirements? Spec-Kit.

If you do use Spec-Kit, remember to build your own templates and memory management system that fits your workflow.

But regardless of whether you use Spec-Kit or not, understanding patterns like context management and sub-agent orchestration is crucial for any AI-assisted development. These are fundamental skills you'll need anyway.

0 replies

NaikSoftware · 2025-09-16T02:30:50Z

NaikSoftware
Sep 16, 2025
Author

@NaikSoftware I think the key is choosing the right tool for each task. Spec-Kit is powerful, but not every problem needs it.

For the simple fixes you mentioned, I don't even open my IDE:

I just use claude-code-action and say, @claude, fix this, and that's it.

Copilot, Gemini, and Claude automatically review the pull requests.

Then I add one line: @claude, can you modify the code based on the code review comments?. Repeat review and fix 2-3 times, then I do the final review. Usually takes 1-2 minutes.

Example: Simple bug reports or minor fixes are automatically implemented, reviewed, and deployed. The beauty is that you can process dozens of these per day on autopilot, freeing you up to focus on the complex tasks that actually need your attention.

For more complex tasks requiring architectural decisions, multiple iterations, or stakeholder alignment - that's where Spec-Kit and proper spec-driven development become essential.

Simple rule: 2-3 sentence task? AI automation. Multi-faceted feature with complex requirements? Spec-Kit.

If you do use Spec-Kit, remember to build your own templates and memory management system that fits your workflow.

But regardless of whether you use Spec-Kit or not, understanding patterns like context management and sub-agent orchestration is crucial for any AI-assisted development. These are fundamental skills you'll need anyway.

Did you read the topic? I tried using it for relatively complex tasks. However, SpecKit failed to pick up on the simple details I described in the task.

Neither I nor my colleagues have yet encountered a single case where we have derived any real benefit from so-called Spec Driven Development. It's just an illusion — you spend hours correcting specs because LLM always makes a lot of mistakes when forming them, no matter how well you set the task. Then LLM drowns in a pile of information and starts generating anything but what you need.

The only people who talk about the real benefits are some contributors to this repository and some YouTubers who saw pretty MD files with emoticons and ran around shouting that machines are replacing people. It looks comical when you actually see how it all works in the real world.

0 replies

amondnet · 2025-09-17T09:34:22Z

amondnet
Sep 17, 2025

@NaikSoftware Thanks for sharing your detailed experience. I understand your frustration with spec-kit generating unnecessary complexity when you've already provided clear, detailed requirements.

You mentioned that Spec-Driven Development feels like an illusion. Have you tried Claude Code's Plan Mode? I've been using it for months and absolutely love it. Let me share why it might address your concerns:

Plan Mode (Shift+Tab in Claude Code) operates similarly to how a senior developer approaches problems: analyze first, then implement. This is fundamentally different from drowning in kilobytes of auto-generated specs.

Vibe coding: Problem → Immediate code → Errors → Fix loops
Plan Mode: Problem → Analysis → Plan → Validation → Systematic implementation

My Two Approaches with Plan Mode

Direct Plan-to-Implementation

Use Plan Mode to create a focused plan
Review and approve
Implement immediately

This works great for smaller features - no unnecessary overhead.

Spec File Based Implementation (Recommended)

Create detailed spec in Plan Mode - but only what's needed
Save as spec.md or GitHub issue
Clear context if needed (/clear)

Implement based on saved spec

Example structure:

## Tasks
- [x] Add parameter as {"value": 1234} format
- [x] Update API handler
- [ ] Add validation tests

Advanced Techniques

Meta-prompting: Guide the planning process itself
Think Mode: For complex architectural decisions
Sub-Agent: Context optimization, parallel execution

Critical Tip: Strict Context Management

Here's something crucial that might explain your issues: Strictly manage Claude Code's context window.

I never let auto-compaction happen - Claude gets significantly less accurate when context is compressed
Always monitor the context usage in Claude's status line
Keep it below 50% at all times
When context grows, clear and restart with your spec file

This alone might be why you're seeing inconsistent results like the JSON format issue you mentioned. Long context = degraded performance.

Real-World Evidence

I'm not alone in finding success with this approach:

Anthropic's own best practices recommend Plan Mode
This video at 19:26 shows practical Plan Mode usage
I've been using Plan Mode as my primary development method for months
many others

The crucial insight: spec-kit is just a tool automating what many of us were already doing manually. If it's adding complexity instead of reducing it, you're absolutely right to skip it.

But don't throw away the baby with the bathwater - Claude Code Plan Mode itself is incredibly effective when used correctly. It's about finding the right balance between planning and implementation, not generating walls of text.

Have you tried using Claude Code's native Plan Mode without spec-kit? I'd be curious to hear if that gives you better results than the overly verbose Spec-Kit output you described.

0 replies

NaikSoftware · 2025-09-17T12:56:42Z

NaikSoftware
Sep 17, 2025
Author

@amondnet yes, I use Plan Mode in Claude Code and in OpenCode very often. I have two workflows in real world tasks:

Relatively simple task: plan mode, refinement of the plan, then implementation
Complex task or when hard to describe what I need: research some insights in codebase using LLM and then implementation with or without Plan mode.

If compare Plan mode in different tools with SpecKit - it feels very different. Plan mode collects info for my requirements and gives ability to improve context understanding. SpecKit generates kilobytes of text and LLM drown in this documents, workflows, tests. Result unpredictable, in most of the cases I revert all and start with ClaudeCode without SpecKit

0 replies

fabiodouek · 2025-09-17T23:15:42Z

fabiodouek
Sep 17, 2025

@NaikSoftware , I've been evaluating a few Spec Driven Development tools in the last month.
It took me a couple of days of experimentation to learn and tweak my specs and steering, especially for the first feature.
After that, I was able to add new features with minimum amendments to the spec, and code working as expected on the first shot.

I started evaluating spec-kit yesterday; a bit early to get to a conclusion, but I like what I've experienced so far. Fine tuning the specs now. If I get similar results, will switch to it.

0 replies

NaikSoftware · 2025-09-17T23:24:12Z

NaikSoftware
Sep 17, 2025
Author

@NaikSoftware , I've been evaluating a few Spec Driven Development tools in the last month. It took me a couple of days of experimentation to learn and tweak my specs and steering, especially for the first feature. After that, I was able to add new features with minimum amendments to the spec, and code working as expected on the first shot.

I started evaluating spec-kit yesterday; a bit early to get to a conclusion, but I like what I've experienced so far. Fine tuning the specs now. If I get similar results, will switch to it.

I agree SpecKit looks the most convenient among competitors.
But main question - maybe investing in the development time would have given a better result than tweaking documents.

0 replies

foertel · 2025-10-15T07:50:21Z

foertel
Oct 15, 2025

and some YouTubers

Man, this one video where someone tested agent-os and in the end the result was: "Okay, the result is not working, but this sure still is really cool!". 🤨

I agree with the overall sentiment. You can not really let the LLM write specs. The specs are the creative input that you give to the process. Then an LLM can do, what you want. But you can have an LLM structure your specs and enrich it with context.

When I tell my co-worker to implement a new field to the ResultRetrievalApi, they will have a quite hard time to do it, if they don't know we are using PHP. And Git. And tests. Most software development needs a lot of context. Humans can imply a lot of that. For example my co-worker is a PHP developer, so there goes that. But the LLM needs this context.

If one prompts the LLM with "build me a diary website", there are 2^256 possibilities to do so. Do you really want your LLM to chose one at random? "build me a diary website based on sqlite and nuxt js" will limit your LLM to 2^128 possibilities. Still not great. So the more context you give, the more the result will be near what you have imagined.

But I totally agree that LLMs using this frameworks generate too much specs and not the right ones. Most of the time I keep deleting half the stuff and manually adjusting the rest. But it's still better than not having specs at all. 🙈 😇

Spec driven development in itself is absolutely valuable to us and we see good results both with and without AI. Writing up a couple of paragraphs in a .md somewhere beats eight people implementing stuff in eight different ways every time.

0 replies

NaikSoftware · 2025-10-15T08:00:27Z

NaikSoftware
Oct 15, 2025
Author

@foertel there is Plan Mode in ClaudeCode or OpenCode or another tools to cover more complex cases where you write "Lets introduce new parameter for..." and it just works very well. My attempts to using spec driven development always feels like waste of time. I use LLM hard and every day for work, and any SDD tools not working as they describe yourself. In real tasks they're generating some unneded text, many text. That I need to rewrite. And no guarantee that LLM will hold all requirements in the memory and use it. Success way is:

split task to atomic points by yourself (in the ticket for example)
ask LLM to implement one by one, using plan mode for complex subtasks where refactoring is needed

0 replies

russeg · 2025-11-01T04:58:22Z

russeg
Nov 1, 2025

spec-kit would have been awesome if not for the fact that ai is very indeterministic.

differences in the following will result in vastly different outputs: model, training data, params, hardware, precision/quantization, software, temperature, system load, inference settings, system prompts, cli agent, etc etc... same prompt an hour later will yield different output and is not guaranteed to be the best or correct.

if your goal is usable and passable software, then i would guess spec-kit is good enough.

0 replies

Nyrok · 2026-03-11T06:59:07Z

Nyrok
Mar 11, 2026

The "kilobytes of unstructured text" problem is real. LLMs don't degrade gracefully with blob specs: they start prioritizing whichever instructions appear most or earliest, and the rest gets diluted.

The fix that worked for me: split the spec into typed semantic blocks before sending it. Role in one block, objective in another, constraints in another. The model processes each independently rather than averaging them together into mush.

I built flompt around this idea: a visual prompt builder that decomposes prompts into 12 typed blocks and compiles to Claude-optimized XML. The output is smaller and more precise than a freeform spec document. Open-source: github.com/Nyrok/flompt

0 replies

SpecKit creates the illusion of work, generating a bunch of text #1784

Uh oh!

Uh oh!

Replies: 27 comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NaikSoftware Sep 8, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NaikSoftware Sep 10, 2025 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NaikSoftware Sep 12, 2025 Author

Uh oh!

Uh oh!

NaikSoftware Sep 12, 2025 Author

Uh oh!

NaikSoftware Sep 12, 2025 Author

Uh oh!

Uh oh!

Uh oh!

NaikSoftware Sep 12, 2025 Author

Uh oh!

Uh oh!

Uh oh!

NaikSoftware Sep 16, 2025 Author

Uh oh!

My Two Approaches with Plan Mode

Advanced Techniques

Critical Tip: Strict Context Management

Real-World Evidence

Uh oh!

Uh oh!

NaikSoftware Sep 17, 2025 Author

Uh oh!

Uh oh!

Uh oh!

NaikSoftware Sep 17, 2025 Author

Uh oh!

NaikSoftware
Sep 8, 2025
Author

NaikSoftware
Sep 10, 2025
Author

NaikSoftware
Sep 12, 2025
Author

NaikSoftware
Sep 12, 2025
Author

NaikSoftware
Sep 12, 2025
Author

NaikSoftware
Sep 12, 2025
Author

NaikSoftware
Sep 16, 2025
Author

NaikSoftware
Sep 17, 2025
Author

NaikSoftware
Sep 17, 2025
Author