Building a SwiftUI App With Claude Sonnet 4 and Gemini 2.5 Pro from Scratch
An interesting adventure in building an app from scratch with SwiftUI: Claude Sonnet 4 seems to keep better track of multi-level agency tasks. Human-like communication with the LLM via screenshots: Relevant areas should be annotated with red arrows to avoid misunderstandings. Also human-like: If the LLM loses faith, reassurance helps. If an LLM type fails at a task, switching to another LLM might help. On another day, the very same bug can be introduced by another LLM. AIs can use tools cleverly, even creating AppleScript to control Finder, to place icons in .dmg
files in the right place to match a background image created with another Python script. Finally, domain knowledge remains important, and sometimes you may even need to provide a code snippet to keep things going.
The Problem
But first, let’s start at the beginning. I originally trained as an electronics technician before starting my computer science studies. There, you sometimes have to deal with filters, built with resistors, capacitors and inductors, which are attenuating some frequencies and let pass the desired ones. In practice, the cut-off frequencies are often controlled with potentiometers. They have theoretically infinite impulse response (IIR), but of course decay over time.
In digital signal processing (DSP), you can define finite impulse response (FIR) filters. For such filters, based on criteria that you define, you want to obtain the coefficients. However, it is impossible to achieve perfection, as there is a conflict between the transition bandwidth, the ripple of the sidebands, the filter order and the delay caused.
The whole area with its theoretical foundations is well researched, the complexity is limited, but unlike let’s say another todo app, there aren’t already tons of graphical tools for it. If I weren’t doing this as an AI experiment, I would have solved it completely differently (as shown later). But for an experiment, it’s ideal to build a graphical tool with AI, trying to solve a problem completely from A to Z.
BTW: you can download the resulting filter designer from here: https://www.sonemix.com/fir-filter-designer.html.
Additional Tools to Xcode
As you see from the title it mentions the LLM types used rather than the actual tools. From my experiments, it seems that for the result the actually used LLM is more important than the tool itself, still even they have an influence of course.
An important class of tools is based on Microsoft’s Visual Studio Code, either as forks such as Windsurf or Cursor, or as plugins such as Cline, Continue.dev or Roo Code and so on.
Microsoft’s terms seem to restrict the use of its official VS Code marketplace to Microsoft products only. Windsurf therefore uses an alternative marketplace. This has practical implications as the official and maintained extension for Swift language support from the swiftlang organization (the extension with the ID swiftlang.swift-vscode
) with verified ownership from the publisher of https://swift.org
is not available on this extension marketplace. This rules out Windsurf as a tool for me.
Cursor, which has relied on Microsoft’s ecosystem from the beginning, may also have to adapt. At the very least, there are reports that MS is starting to enforce its license terms, especially for its own extensions (e.g. SSH). As far as the Swift extension is concerned, however, this is available in the official, current version.
How this turns out is not that important to me, as all these tools have the same origin and it’s super easy to switch between them. A colleague recently recommended the Roo Code extension for VS Code and I really like it. I particularly like the fact that you can directly see the amount of tokens you are sending/receiving and the cost of each request. You don’t have to switch to a web interface to see this, as you do with Cursor. Compared to Windsurf and Cursor, it is more expensive if you look at one dialog session, but then you only pay for the actual usage per API key and not for a certain expiring quota per month.
Project Setup
What works well for me is first setting up the project in Xcode. Then open the project folder in VS Code or Cursor and add a Package.swift
and VS Code settings JSON to exclude the .build
folder (for Cursor add .cursorignore
). In Xcode everything works as usual and in VS Code you can use the Swift language extension.
In Cursor you can simply change the supported LLMs in the AI tab and for Roo Code you need to add your API keys. That’s it.
Claude Sonnet 4
You start with a prompt that describes the goal in a paragraph or two, and then you get code that looks quite nice at first glance. It always does, doesn’t it?
So you start the brand new program and after 20 seconds you realize that it needs about 1.5 GB of memory, and it takes a minute until it finally starts and displays the graphs (which also already look comparatively good initially).
Good exercise. Let’s let the LLM find the problem by simply describing the symptoms (memory usage/launch time).
Sonnet optimized things in several iterations that have no real impact on the problems mentioned, but rather introduced new problems and finally concludes that 1GB+ is generally due to the SwiftUI framework and that it would be acceptable since modern Macs have plenty of RAM. This is not the case, as we know. Some of the problems that were introduced during the trials had to be solved later with greater effort.
Well, let’s take a different approach: why not use Gemini 2.5 Pro. Describing the symptoms it is indeed able to follow a systematic approach (divide et impera) and after a few interactions is finally able to achieve a start of the program with a peak memory consumption of 57 MB by identifying and removing the causing slider in tests. Interestingly, however, it lost access to the tools in between and prompted me to make the changes manually. Restarting Cursor did not help. Neither did asking the AI. So after I had made a few changes manually, I asked the program if it had access again. It didn’t think so. It only changed when I asked it to just try, and then again by assuring it that it had access to the tool, and it could do it. That’s kind of funny. So in the end, it’s a natural language communication. With some encouragement, it regained confidence in its tool access.
Of course, leaving out the sliders was not a real solution. The next day I wanted Gemini to reintroduce the slider and the bug was there again, this time caused by Gemini and not by Sonnet. Here I finally had to give a custom binding approach by means of a small snippet so that the problem could be fixed satisfactorily.
By the way: tool errors occurred more frequently than I had expected. Not only in this project, but also in others in Roo Code, search and replace often goes wrong and Cursor occasionally reports a failed attempt of the edit tool (edit_file) in addition to the Gemini problems above. Not used for this project, but I have not noticed anything like this in Jetbrains’ Junie until now.
Anyway from there on now we have something where we can continue to work with. It has several quirks, but it already does something - multiple bugs immediately visible respectively experienceable.
So back to Sonnet 4, and then I gave it tasks to shape the program gradually. This is similar to the iterative approach that we know from normal dev work. And here, Sonnet 4 does a pretty good job - it divides each step into subtasks and can keep track of the goal/context during the subtasks of such steps without any problems - better than Sonnet 3.x, as far as I can see. In addition, it does not overshoot the mark as was occasionally the case with Sonnet 3.5, but fulfills the task actually required.
Switching between Xcode and VS Code / Cursor is seamless, unless you make manual changes and do not save.
Bug Fixes by Describing Only Symptoms
Sometimes I should listen to myself: After Gemini found the problem, I should have undone Sonnet’s previous attempts, including the bugs introduced, but I didn’t, and that cost me time later.
One of the errors was that larger filters were calculated twice when a preset was selected - the progress indication was displayed twice and the result was displayed briefly in between, which led to an unsightly flickering. Just describing the error symptoms did not lead to a solution, at least I was not patient enough to let it try for too many iterations - especially since as a dev you would first briefly comment out the update routine to see if there is too many calls causing it and then quickly narrow it down further.
Also when I had a problem installing the signal package for Octave because there was an inconsistency with existing dynamic libraries, it blithely just deleted and recreated links in brew directories after getting the error message, which didn’t fix the problem but meant I had to manually correct the brew installation too (brew doctor
is your friend). Of course it’s not a problem if happening in /opt/homebrew
, you can reinstall everything, I understand that. But there you can also see the reason why I (so far) still want to manually approve or decline terminal command executions.
Communicating with Images
In between, it is sometimes easier to communicate with screenshots. As expected with a natural language interface, there are some ambiguities. Just like it is between people. So when you take screenshots, you can apply what you would do when communicating with humans: bold red arrows help to clarify what you want to point to.
2^14.5 Bit
MATLAB has a FIR filter module. Besides other export options it offers the possibility to output 16-bit signed integers for the coefficients. So instead of only exporting float coefficients, let’s also provide the possibility in the same way as MATLAB does. So quickly requested and it uses the full 16-bit range.
But the result is different with the same input from MATLAB. One question, though, is how the scaling is done there. Let’s feed in the two sets of numbers and let the LLM figure it out. It recognized that MATLAB most likely uses a Q15 fixed-point format, but then writes in the summary that the scaling factor is 2^14.5 bit, which would be very unlikely. In fact, they use 2^15 as you can see if you apply it as you get the same coefficients.
Subtle Bug - Domain Knowledge
In the meantime, all inputs are working OK, some validations are integrated and so on. But the result differs from the expected graphs when compared. Since Gemini Pro was successful with the memory debugging, I tried to let it find that too. Eventually it concludes that the calculations are correct and that what I see should match the reference (which it doesn’t). So we’re stuck here.
This is where human help is needed. The bug that actually caused the problem was a subtle one. It used the filter order and passed that through as the number of taps into the calculation (but num taps should be actually order + 1), which of course then gives different results.
BTW: Not directly related, but this made me want to limit even the UI to an even filter order (which means an odd number of taps). Theoretically, you could use either an even or odd number of taps. However, for audio signals, an integer group delay is good for signal alignment, for that an odd number of taps is needed. Also, certain types of linear-phase FIR filters with an even number of taps exhibit a zero response at the Nyquist frequency, which is not desired for high-pass or bandstop filter designs (although hardly relevant in practice, but you get it for free).
Surprise - Clever Tool Use
Then I gave the program a task that I thought it couldn’t solve, but it proved me wrong: I asked it to create application icons. They are actually .png images, not easy to create if you can only read and write text. But it uses its tools cleverly: It created a Python source code file and executed it afterwards per terminal command. It knew about the format and image sizes need for the app icons. The resulting icon itself wasn’t good, and it didn’t know that unlike iOS, you have to pay attention to rounded corners and transparent areas when designing icons on macOS. My mistake, because I didn’t expect it to work, so I didn’t put any details in the prompt. Then I asked for a green band pass with borders and so on and the result was much better. Cool thing, you can easily adjust the corners, transparent area and colors in the Python source and even better, for the smaller variants it automatically uses less detail. Pretty neat.
Encouraged by this, I ordered it to try a DMG packaging. The first attempt was only a little off with the dimensions, I already knew that it can create a background image with Python. But the next surprise: for the placement it uses AppleScript and controls the Finder. This is necessary because the icon position is stored by Apple in an undocumented file called .DS_Store
. Then it uses hdiutil
to create the actual .dmg file, makes some cleanups and prints a summary. The whole thing is packaged in bash shell script. This was a total time saver.
This in turn prompted the creation of online help. It was there immediately and was displayed in Roo Code in the integrated browser. The .plist
and the help bundle plus call in the app were also generated. And since I did the whole exercise not too long ago with GitChronicles and it took me an unexpected amount of time back then, I was pleasantly surprised. But only until I tried the help and it didn’t show up. I then put Claude 4 and Gemini Pro on it, but even they couldn’t get it to work. Finally I got impatient, copied the help from GitChronicles, placed all the entries in the .plist
files and made the calls the same way as in GitChronicles. I only used the newly created help content. As expected, this worked immediately. In the AIs’ defense, it must be said that macOS’s online help is poorly documented, to put it kindly, and there are also few examples - in fact, it simply requires trial and error.
Learnings
So what have I learned from this little exercise?
- Windsurf uses an alternative marketplace for extensions where the official Swift plugin is not available, that rules Windsurf out for me.
- For non-textual artifacts such as images or disk image files, etc., AI cleverly combines text creation capabilities for generating shell or Python files and executing them via terminal commands. It even uses AppleScript to control Finder to place components in a
.dmg
file. - The Python files created for the app icon generation allow for easy customization (e.g. for dimensions, colors, etc.). Smaller application icons were even created with less detail without me asking for it.
- Not really new, but care should be taken to keep the code simple. If an attempt to implement a feature or fix a bug fails, you should undo it immediately - otherwise you will have to invest in fixing the bugs introduced later. Again, this is similar to the normal development process, but it’s easier to overlook something.
- If one type of LLM can’t solve the problem, switch to another. This may solve the problem.
- Describing only the symptoms leads sometimes to a fix - sometimes it’s not able to do so.
- For read and write operations, I strongly assume that they are sandboxed and can only take place in the project folder, so I set automatic approval. With Git, you can discard changes at any time. Of course, by definition this can’t be the case when running terminal commands, and given the attempts described to fix problems with a package installation, I still think it makes sense to manually approve or reject terminal executions.
- If needed, you can also communicate with the AI using screenshots, annotations with red arrows help to avoid misunderstandings, after all, this is natural language communication.
Overall
For things like proofs of concept, prototypes and projects starting from scratch, it’s great and allows you to get started many times faster. The gain is much smaller as you progress and especially when you have to work in a complex environment. As with traditional development work, it pays to keep things simple.
Overall, I am impressed with both Sonnet 4 and Gemini 2.5 Pro. For my usual development tasks, especially GUI related, Claude still seems to be the best choice at the moment. But everything is in flux.
Finally, I have to admit that without the experiment I would have simply approached the problem just using the Python-based SciPy's
signal package function firwin
. A single call with the parameters and you have the coefficients, literally a one-liner. And with matplotlib.pyplot
you can output it. There is a reason why Python and its ecosystem are so popular. It’s just without an interactive UI.
We can therefore easily use this to compare the results of both:
And here the presets in order (left the app, right plot of firwin
). You can tap on it to see large versions: