Reflections on building with AI

As much as I enjoy writing software, I generally won’t unless I have a particular problem to solve, even if it’s a manufactured and silly problem to solve. I’ve been itching to try out this newfangled way of working with AI development tooling, and I finally found a complex problem to try it out!

I write this post to capture my experience and reflections after building and shipping some real code. This post isn’t going to include any crazy revelations about AI development workflows – the internet is a firehose of AI-pilled one-upmanship already. But if nothing else, it’ll capture a moment in time that may be laughably different in 6-12 months.

My friend runs a small online school that trains laypeople on relevant religious topics to make them more effective in their jobs – think volunteers who teach Sunday School or teachers teaching at a Lutheran school for the first time. Like many schools of all sizes, he was using Canvas as a Learning Management System, and the massive outage ransomware attack gave him an opportunity, let’s say, to reevaluate his tech stack.

Project the first: LMS

My first project was to host the open-source version of Canvas in our own AWS instance. There’s a repository, now a few years out of date, containing some simple docker-compose files to build and host the LMS. As much as I love being my own SRE, I have no interest in plodding through error logs in buildfiles, but with some copy-pasta in and out of the Claude web UI, I eventually got it working and upgraded to the latest version. Claude was scarily good at knowing what to try next. Where it really shone was in proposing some high-value monitoring for my new application and using the right incantations to set up the Cloudwatch metrics, alerts, and email subscriptions for those alerts. AWS is a very flexible service that allows one to do almost anything, but none of it comes easily. It’s been a while since I’d wrapped my head around the data model for AWS monitoring, and I certainly wasn’t comfortable with the syntax in the CLI for setting everything up. Claude crushed it, and when I had issues, we were able to debug them quickly.

Project the second: SIS

My friend runs the administration of his school in spreadsheets. This includes setting up courses, and tracking contact information for instructors and students, whether the instructors had been paid, student registration and payment status, student cohorts, and grouping courses into programs. At the scale he was working with before, this was easy enough, but he got some more interest in his school, and you can see how this would get out of hand quickly. Larger schools use a Student Information System to track all this and sync it into their LMS.

After some iteration with Claude, my friend had generated an 8000-line single-page HTML file that largely expressed the feature set he was looking for using localStorage as a database. In itself, it’s wildly impressive that with Claude’s help an amateur who hasn’t developed software in decades could produce a functional proof of concept. My second project was to productionize this SIS, which gave me a chance to play around with Claude Code.

Breaking out the storage layer to use a server turned out to be relatively easy – the hard part was untangling the tech stack into something reasonably maintainable. Claude and I rearchitected the data layer away from an enormous blob that gets read/saved all at once, broke out the client-side JavaScript business logic into a decomposed TypeScript codebase, converted all the vanilla HTML components into React, restyled the whole thing for consistency, and then added ~500 unit and end-to-end tests to capture all the new behavior.

Holy goodness gracious, Claude did a fabulous job. Similar to learning the ins and outs of the AWS CLI, I have little interest in doing migrations to known-better frameworks or hunting down visual bugs on the web. It took some coaching to get Claude to create side-by-side setups of the original HTML app and the new frontend, programmatically take screenshots of each, compare them, and make changes based on the visual diffs on its own. But once I got it going, it got about 85% of the functionality and look/feel on its own. And now that everything is typed and well-tested, I feel confident in making iterative improvements and adding new features going forward. Bravo.

Workflow

Claude Code is very powerful, but it still needs a lot of guidance in how to approach projects. After failed prior attempts at one-shot coding complex projects, I’ve learned to create a new git branch for a series of changes, ask Claude to generate plans for complex projects, and to use a shared task list with Claude, systematically guiding it through development, one task at a time. It was ~85% correct in building stuff, so I couldn’t just let it run on the task list, and when I tried to do a few tasks at once, Claude would sometimes go off on a crazy tangent trying to do something completely off the rails. I would have expected Claude to provide me more guidance on how best to make the most of its immense power.

I found that outlining my expectations for workflow and architectural guidance in a CLAUDE.md helped, but I don’t think that Claude would have done this on its own. For example, I had to instruct it to add tests after each change or tell me why it wouldn’t add tests for that change, and I needed to tell it to run tests and commit after each incremental feature. I’m surprised that it wouldn’t have done this on its own, and it has me worried about folks building without basic software engineering discipline (more on that below).

Especially for the major migrations to TypeScript and to React, Claude churned for a very long time refactoring and testing, refactoring and testing, etc. As a lowly Pro user with occasional usage credits, I found that it spent my tokens and exhausted my session limit quickly. When at my desk, this translated to a lot of sitting and waiting, nudging it along from time to time. But hooking my Claude Code session up to the mobile app was a gamechanger. I couldn’t do real development or poke-test the app, but I could easily guide it through a refactor. When my session ran out, I’d set an alarm and poke it from the mobile app once I had more tokens. Side note: I’m now very reacquainted with UTC time.

It wasn’t great to feel chained to my desk when I had tokens to burn, and I wasn’t thrilled to have yet another chat app on my phone, but it’s truly incredible to be able to guide Claude on the go.

Execution vs. expertise

Through my advising work, I’ve seen many companies that leverage AI for execution to great success, but those that are using it as a substitute for expertise end up creating tons of technical (and sometimes organizational) debt and are left struggling to dig themselves out of it. Execution without expertise is just debt on autopilot, and this was my experience on these projects, too.

Claude is fabulous at taking a scoped problem and executing on it quickly and with relatively high quality, and at times, it can go the extra step of proactively hardening, refactoring, or testing code. Truly unbelievable. However, I don’t think I would have gotten to a positive outcome without personally using my experience and expertise in software development to guide Claude.

The way my expertise manifested was in the approach to (re)building the app and in the vocabulary to discuss issues as they arose. I tried to get Claude to propose a way to restructure and productionize the code, but it ultimately took my strong direction to refactor the data model, to pull out a server layer, to extract client-side logic into TypeScript, and then to convert the frontend to React. When getting to visual parity, it couldn’t figure it out until I instructed it to use parallel servers and screenshots. And when things went wonky, my approach to debugging and fluency in engineering ultimately enabled us to get to root causes in a way that didn’t just stack up extra technical debt.

Takeaways

My primary takeaway from all of this is that we’re going to have SO MUCH MORE BAD SOFTWARE out there. Anyone can very quickly build an app that looks and feels correct, and for the right product and quality risk, this is incredible. But productionizing these apps and maintaining them, especially with more than one person, is a much harder problem that requires product development expertise, and I don’t see apps like this replacing all SaaS apps everywhere any time soon.

All that said, it’s really, really fun to build with these tools – they unlock an incredible level of execution and possibility for creativity. The future is now.