SpecKit creates the illusion of work, generating a bunch of text #1784
Replies: 27 comments
-
|
I believe if you have a well defined constitution and take the time to invest in refining the plan stage then it could be well worth the effort. I'm realizing that for smaller changes, speckit is overkill. But, if you'd like to make significant changes to a codebase then it's a useful tool. I do agree that if you are not careful during the plan phase it will ignore most of the current codebase structure. For claude usage, i recommend creating a plan with Claude first, refining it, then before approving claudes plan, simply deny the plan and proceed to only type "/specify". This gives you a good starting point plan. Well, better than specifying one yourself. |
Beta Was this translation helpful? Give feedback.
-
|
I started my development journey back in the VB4 days (yikes, that does date me!). Over the years I’ve worn a lot of hats: from being an independent developer to overseeing larger projects with contracted teams. That mix has given me some perspective that I think relates directly to your points. Like you, I’ve seen countless tools and philosophies come and go, all promising productivity boosts. AI tools are the latest rabbit hole, but they require a different mindset: we need to approach them as if we’re contracting work out. If we don’t provide clear direction, we’ll get results that miss the mark. This ties back to your critique of Spec-Kit:
So, while I share your skepticism about Spec-Kit as a drop-in tool for incremental dev, I think the real opportunity lies in how we manage context and documentation. Done well, it can bridge the gap you’re pointing out between “toy prototype generator” and “practical assistant.” PS I used ME to write this 😄 (I know you were talking about the model you use when you code, but I couldn't help it!) |
Beta Was this translation helpful? Give feedback.
-
Problem not in description, not in planning. I have well documented task (new feature). Described all needed things: constants and their values, places where I need the changes and so on. Its a real task with some increment for the user. But instead of just investigating codebase and creating comprehensive tasks with needed changes it's working about 30 min and created bunch of unneeded and don't described stuff. Some interfaces I am not asked, hundreds of tests and other things that I should fix on the implementation stage. My point - I can work much faster using LLM just as helper with queries like "create such a class with described functionality", finish this small part of work, fix some code if needed and move on. What place of SpecKit in this flow? Is it really needed if I should spend hours to fix plan, tasks and then fix implementation anyway? |
Beta Was this translation helpful? Give feedback.
-
|
I also tried it out yesterday and find it somewhat dissapointing. And I can agree on the general Intuition what @NaikSoftware is already describing. It feels that Claude (wich I used) looses relative fast the overall Picture after starting an Implementation and after nearly an workday of keeping him running all the Time I had to invest alot of time to Pinch it back to create an usable experience.It keeps falling into the TDD trap of concentrating on the test cases and starts iterating on those issues very fast and after a short time it looses the initial todo very fast. Especially on Greenfield Projects (wich my was) It looses to fast track of an certain task, because the initial spec covers the whole application. I have to try this out again on an Brownfield Implementation to experience how this works. I'm not sure if its related to my incapability to use this framework correctly, wich I must confess can also be true. But to work properly I think there are missing guardrails for Claude to Focus on the certain aspect. Also it keeps failing to update the Tasklist with the work it has done and if there is an new Session Started the Agent is not aware enough of the overall way to work. I think it would be good if the setup script would at least ship with an dedicated claude agent who keeps his context window focused to the task itself. This will of course work only with Claude but in my current point of View it must be more focused on what to do instead of tackling all those Test cases. But overall thanks for the Idea I would say you are on the right track but we need more Guardrails here wich can be surely clarified trough precise Priming and Context management. 👍 |
Beta Was this translation helpful? Give feedback.
-
|
In order to save my Claude Code tokens (I pay for the Pro plan), I use it exclusively for coding and have been using Gemini CLI (2.5-pro) for all my planning/spec work. I've found that, as awful as Gemini (both Pro & Flash models) has been with coding tasks (often creating code that is incomplete and riddled w/ compile-time errors), it has been an incredibly comprehensive and useful tool during spec work. Once I get it to a point where I'm ready to code, I then use Claude Code to start implementing based on all the specs created with Google. It's been a good workflow up to now. Yesterday, I used Gemini (gemini-2.5-pro) to spec out a new service-based project (no-code whatsoever) and was blown away by how comprehensive the results were. I first ran specKit without editing my constitution.md file and noticed sub-par results. Then I edited the file and re-ran the /specify command and the difference was significant. So far, I've had a great experience with this tool and I plan to use it again for another project tomorrow (no pun intended...or was it? 🙂). |
Beta Was this translation helpful? Give feedback.
-
|
(I'm not a native English speaker, so please excuse any parts that seem like they were put through a translator.) |
Beta Was this translation helpful? Give feedback.
-
If you asked raw ChatGPT or Claude to do same thing it will work. But when you put kilobytes of text into context it can randomly fail. More input data = more unpredictable results, in my experience. I added detailed instructions into CLAUDE.md about my project (several months ago, it's wasn't one-day test) and it added certain randomness into daily tasks. Sometimes some instructions takes into account, sometimes no. Then I reduced CLAUDE.md by three times and surprise! It greatly improved stability of work. |
Beta Was this translation helpful? Give feedback.
-
|
Absolutely! The old adage, less is more really holds true when dealing with LLMs. |
Beta Was this translation helpful? Give feedback.
-
|
@NaikSoftware Over the past few years, I’ve successfully delivered 20+ AI-assisted projects, evolving from “vibe coding” toward primarily agentic coding and SDD. Key insights from this journey:
I’d encourage critics to try the difference between vibe coding and agentic coding first-hand. SpecKit is not just another text generator—it’s a framework for systematic development in the AI era. Personally, I’ve been very satisfied with its impact and continue to extend our internal tools based on it. Change always meets resistance, but what’s clear is that the developer role is evolving: from primarily writing code to orchestrating AI, designing specifications, and ensuring quality. SpecKit reflects that shift. |
Beta Was this translation helpful? Give feedback.
-
|
For those interested in diving deeper into these concepts:
And the spec-kit approach is quite similar to the best practices of Claude Code:
|
Beta Was this translation helpful? Give feedback.
-
Thank for the sharing you experience. I am just expressing how it works for me. Maybe depends from the project stack/complexity. But on real system with different modules (Flutter, JS, Java, ObjC and communication layer) SpecKit is almost impossible to use in a way that brings benefit. I tried Kiro IDE and BMAD - the same feeling. I spent much more time to implement something. Stuck in hundreds of specs which makes no sense because LLM can randomly ignore some of them. Simple example: we have a task part of which is this: we should introduce new parameter in json that looks like "param": {"value": 1234}. There are examples in the codebase, I described additionally in the task description. But final results - LLM decided to use "param": 1234 instead of my requirements. I was small part of the whole task so I just approved changes, because described clearly what I need. As a result a lot of time was spent for writing specs and a lot of time for debugging. Tests can't help because LLM written the test that just pass. So when you see good looking generated MD files with formatting and emojis it is just text. And LLM operates as text generator in this case 🥲 This creates Illusion of work being done. |
Beta Was this translation helpful? Give feedback.
-
|
Wow, this is a great thread with a lot of great information in it. @NaikSoftware, As with any new skill, it takes practice and experiecne to get it right. It will take awhile for developers to learn the appropriate skills to interact with AI and AI will evolve and get better over time. Per your example, this is where experience with managing context comes into play. While you will likely get mitakes like that, the better that you can craft the context, the fewer of those typers of mistakes will pop up. Like with any jr developer, there will be mistakes and sr developers have to be there to catch them. |
Beta Was this translation helpful? Give feedback.
-
|
@hendrix04 I expect that SpecKit is designed to manage context. But it turns out that you need to know some magic spells so that LLM does not throw away context and does not make mistakes |
Beta Was this translation helpful? Give feedback.
-
|
Isn't it simpler to just think through simple steps and give them to the LLM directly, without overengineering? |
Beta Was this translation helpful? Give feedback.
-
|
SpecKit is designed to HELP manage context, but it isn't something that manages context on its own. In your example above of the LLM doing the wrong format on your object, my guess is that is a case of too much context. There is a very fine balance between too much information and not enough information. The English language is not very precise, and you can see this by getting wildly different output just from changing one or two words in a prompt. It is true though; you do indeed need to learn "magic spells" so to speak to make the LLM do what you want it to do. Like everything else in the world, LLMs are tools that require training and experience to leverage them effectively. |
Beta Was this translation helpful? Give feedback.
-
Are you sure that generating 20-30 Kb of some stuff like detailed rephrase of your request with mix of some interfaces and another "water" can be called as "help managing context"? I trying again to use this tool and it's so hard to call "help" 😔 After hours of working on a not very complex feature, I decided to simply run a regular prompt in Claude Code describing the problem. Even CC understood that something was wrong with the SpecKit result. It removed all the unnecessary stuff (some innovative architectural solution with refactoring half of the project, which ultimately was absolutely unsuitable) and made a few truly necessary changes. Everyone who writes that SDD really works, do you actually get working results? Or are you just looking at 30 nicely formatted tasks and thinking 'Well, this is cool'. |
Beta Was this translation helpful? Give feedback.
-
I agree with your explanation, I've been testing Kiro who was the first that implemented it in the right way. IMO, this project will enhance the way of creating an solid documentation and for the implementation and context control it will be necessary adapt for each specific senario using and intermediarie step to check in the current context spreaded into several subagents |
Beta Was this translation helpful? Give feedback.
-
|
@NaikSoftware I think the key is choosing the right tool for each task. Spec-Kit is powerful, but not every problem needs it. For the simple fixes you mentioned, I don't even open my IDE:
Example: Simple bug reports or minor fixes are automatically implemented, reviewed, and deployed. The beauty is that you can process dozens of these per day on autopilot, freeing you up to focus on the complex tasks that actually need your attention. For more complex tasks requiring architectural decisions, multiple iterations, or stakeholder alignment - that's where Spec-Kit and proper spec-driven development become essential. Simple rule: 2-3 sentence task? AI automation. Multi-faceted feature with complex requirements? Spec-Kit. If you do use Spec-Kit, remember to build your own templates and memory management system that fits your workflow. But regardless of whether you use Spec-Kit or not, understanding patterns like context management and sub-agent orchestration is crucial for any AI-assisted development. These are fundamental skills you'll need anyway. |
Beta Was this translation helpful? Give feedback.
-
Did you read the topic? I tried using it for relatively complex tasks. However, SpecKit failed to pick up on the simple details I described in the task. Neither I nor my colleagues have yet encountered a single case where we have derived any real benefit from so-called Spec Driven Development. It's just an illusion — you spend hours correcting specs because LLM always makes a lot of mistakes when forming them, no matter how well you set the task. Then LLM drowns in a pile of information and starts generating anything but what you need. The only people who talk about the real benefits are some contributors to this repository and some YouTubers who saw pretty MD files with emoticons and ran around shouting that machines are replacing people. It looks comical when you actually see how it all works in the real world. |
Beta Was this translation helpful? Give feedback.
-
|
@NaikSoftware Thanks for sharing your detailed experience. I understand your frustration with spec-kit generating unnecessary complexity when you've already provided clear, detailed requirements. You mentioned that Spec-Driven Development feels like an illusion. Have you tried Claude Code's Plan Mode? I've been using it for months and absolutely love it. Let me share why it might address your concerns: Plan Mode (Shift+Tab in Claude Code) operates similarly to how a senior developer approaches problems: analyze first, then implement. This is fundamentally different from drowning in kilobytes of auto-generated specs.
My Two Approaches with Plan Mode
Advanced Techniques
Critical Tip: Strict Context ManagementHere's something crucial that might explain your issues: Strictly manage Claude Code's context window.
This alone might be why you're seeing inconsistent results like the JSON format issue you mentioned. Long context = degraded performance. Real-World EvidenceI'm not alone in finding success with this approach:
The crucial insight: spec-kit is just a tool automating what many of us were already doing manually. If it's adding complexity instead of reducing it, you're absolutely right to skip it. But don't throw away the baby with the bathwater - Claude Code Plan Mode itself is incredibly effective when used correctly. It's about finding the right balance between planning and implementation, not generating walls of text. Have you tried using Claude Code's native Plan Mode without spec-kit? I'd be curious to hear if that gives you better results than the overly verbose Spec-Kit output you described. |
Beta Was this translation helpful? Give feedback.
-
|
@amondnet yes, I use Plan Mode in Claude Code and in OpenCode very often. I have two workflows in real world tasks:
If compare Plan mode in different tools with SpecKit - it feels very different. Plan mode collects info for my requirements and gives ability to improve context understanding. SpecKit generates kilobytes of text and LLM drown in this documents, workflows, tests. Result unpredictable, in most of the cases I revert all and start with ClaudeCode without SpecKit |
Beta Was this translation helpful? Give feedback.
-
|
@NaikSoftware , I've been evaluating a few Spec Driven Development tools in the last month. I started evaluating spec-kit yesterday; a bit early to get to a conclusion, but I like what I've experienced so far. Fine tuning the specs now. If I get similar results, will switch to it. |
Beta Was this translation helpful? Give feedback.
-
I agree SpecKit looks the most convenient among competitors. |
Beta Was this translation helpful? Give feedback.
-
Man, this one video where someone tested agent-os and in the end the result was: "Okay, the result is not working, but this sure still is really cool!". 🤨 I agree with the overall sentiment. You can not really let the LLM write specs. The specs are the creative input that you give to the process. Then an LLM can do, what you want. But you can have an LLM structure your specs and enrich it with context. When I tell my co-worker to implement a new field to the ResultRetrievalApi, they will have a quite hard time to do it, if they don't know we are using PHP. And Git. And tests. Most software development needs a lot of context. Humans can imply a lot of that. For example my co-worker is a PHP developer, so there goes that. But the LLM needs this context. If one prompts the LLM with "build me a diary website", there are 2^256 possibilities to do so. Do you really want your LLM to chose one at random? "build me a diary website based on sqlite and nuxt js" will limit your LLM to 2^128 possibilities. Still not great. So the more context you give, the more the result will be near what you have imagined. But I totally agree that LLMs using this frameworks generate too much specs and not the right ones. Most of the time I keep deleting half the stuff and manually adjusting the rest. But it's still better than not having specs at all. 🙈 😇 Spec driven development in itself is absolutely valuable to us and we see good results both with and without AI. Writing up a couple of paragraphs in a .md somewhere beats eight people implementing stuff in eight different ways every time. |
Beta Was this translation helpful? Give feedback.
-
|
@foertel there is Plan Mode in ClaudeCode or OpenCode or another tools to cover more complex cases where you write "Lets introduce new parameter for..." and it just works very well. My attempts to using spec driven development always feels like waste of time. I use LLM hard and every day for work, and any SDD tools not working as they describe yourself. In real tasks they're generating some unneded text, many text. That I need to rewrite. And no guarantee that LLM will hold all requirements in the memory and use it. Success way is:
|
Beta Was this translation helpful? Give feedback.
-
|
spec-kit would have been awesome if not for the fact that ai is very indeterministic. differences in the following will result in vastly different outputs: model, training data, params, hardware, precision/quantization, software, temperature, system load, inference settings, system prompts, cli agent, etc etc... same prompt an hour later will yield different output and is not guaranteed to be the best or correct. if your goal is usable and passable software, then i would guess spec-kit is good enough. |
Beta Was this translation helpful? Give feedback.
-
|
The "kilobytes of unstructured text" problem is real. LLMs don't degrade gracefully with blob specs: they start prioritizing whichever instructions appear most or earliest, and the rest gets diluted. The fix that worked for me: split the spec into typed semantic blocks before sending it. Role in one block, objective in another, constraints in another. The model processes each independently rather than averaging them together into mush. I built flompt around this idea: a visual prompt builder that decomposes prompts into 12 typed blocks and compiles to Claude-optimized XML. The output is smaller and more precise than a freeform spec document. Open-source: github.com/Nyrok/flompt |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Even after providing complete and detailed information, the result is as follows:
Conclusion: The area of use for SpecKit is very limited - it's for testing new ideas, generating simple prototypes. For proper incremental work, you need to:
Based on the described points, SpecKit only complicates the work. The project needs to change its concept - it's not about software development, but more about generating ideas and a general plan. It's more a tool for a product owner than for a developer.
P.S used with Claude Code (Opus 4.1)
Beta Was this translation helpful? Give feedback.
All reactions