I Built a Dead Simple App Because Claude Code Couldn't Hear Me
Claude Code on Bedrock doesn't expose a microphone. I type slowly. So I built an iOS app that transcribes speech and drops it straight to the clipboard.
I use Claude Code through Bedrock at work. That version doesn’t give the AI access to your microphone, so the voice input that makes the native Claude desktop client fast just isn’t there. I like speaking to Claude more than typing. Over several months that small friction accumulated into something I actually wanted to fix.
The fix was obvious: an app that listens, transcribes, and puts the text on the clipboard. Switch to whatever terminal or text field you’re using, paste. One step in the middle instead of typing everything out.
What it does
Mic to Clipboard is one screen, one button. Tap the mic, speak, tap again. The transcript lands in your clipboard. You paste it wherever you want.
That’s the whole app. No accounts, no sync, no settings beyond a light/dark mode toggle. It runs on-device: Apple’s speech recognizer does the transcription locally so nothing leaves your phone.
The stack
React Native via Expo, because I wanted to ship to iOS without writing Swift. Two packages do all the real work:
expo-speech-recognitionwraps Apple’sSFSpeechRecognizerAPIexpo-clipboardwrites the final transcript to the system clipboard
Expo’s managed workflow meant I could build the whole thing without opening Xcode during development. I only touched Xcode when it was time to configure things for the App Store submission.
Continuous transcription
The interesting part of the core hook is how continuous speech recognition actually works. Apple’s recognizer fires result events repeatedly as it processes audio. Each result is either interim (still processing, may change) or final (committed). But when you speak in long sentences with natural pauses, you get multiple final results in a row, not one big one at the end.
So I keep a ref that accumulates the committed finals:
useSpeechRecognitionEvent("result", (event) => {
const text = event.results[0]?.transcript ?? "";
if (event.isFinal) {
if (text.trim()) {
accumulatedRef.current = accumulatedRef.current
? accumulatedRef.current + " " + text
: text;
setState((prev) => ({
...prev,
transcript: accumulatedRef.current,
interimTranscript: "",
}));
}
} else {
setState((prev) => ({
...prev,
interimTranscript: text,
}));
}
});
The accumulatedRef is a plain ref rather than state because I don’t want re-renders every time it updates mid-sentence. State updates only happen on final results. When the session ends, the accumulated string is what gets written to the clipboard.
The display text that appears on screen combines both pieces: whatever is committed plus the in-flight interim, so you see words appearing as you speak:
return {
...state,
toggle,
displayText: state.interimTranscript
? (accumulatedRef.current
? accumulatedRef.current + " " + state.interimTranscript
: state.interimTranscript)
: state.transcript,
};
On-device vs. network fallback
Newer iPhones support fully on-device speech recognition. Older ones fall back to Apple’s servers. Rather than just picking one, the app checks at runtime and uses the right config:
const supportsOnDevice =
await ExpoSpeechRecognitionModule.supportsOnDeviceRecognition();
const config = supportsOnDevice ? SPEECH_CONFIG : SPEECH_CONFIG_NETWORK;
ExpoSpeechRecognitionModule.start(config);
The two configs are identical except for requiresOnDeviceRecognition: true. On-device is preferred because nothing leaves the device, but requiring it on older hardware would just fail silently. The fallback handles it without any user-visible difference.
Silencing the no-speech error
If you tap the mic button and then don’t say anything, the recognizer fires an error event with code no-speech. I was initially treating that the same as real errors, which meant the UI would flash an error state every time someone changed their mind or accidentally tapped the button.
useSpeechRecognitionEvent("error", (event) => {
if (event.error === "no-speech") {
return;
}
// handle actual errors
});
Silence isn’t an error. Filtering it out means the button just returns to idle with no drama.
The hard part: App Store paperwork
The code took a weekend. Getting through App Store review took longer and was more tedious than I expected.
Apple’s privacy manifest system requires a structured XML declaration of which system APIs you use and why. expo-speech-recognition accesses the microphone, and apps using certain APIs need to explain themselves in a format Apple can parse. The permission strings in Info.plist also needed to be specific enough to pass review.
There was also the encryption declaration. Any app that uses HTTPS, even passively (every app does), technically uses encryption and needs to be flagged as non-exempt. It’s a paperwork step, not a security review, but a missing checkbox gets the submission bounced.
Screenshot requirements were the most mechanical part: specific pixel dimensions for iPhone 6.7” and 6.5” layouts, taken from simulators at exactly those resolutions. Three to five screens minimum. It’s a twenty-minute process once you know the sizes. Discovering them for the first time mid-submission is not ideal.
Where it is now
Live on the App Store. Works on iPhone and iPad. Apple automatically makes it available on Apple Silicon Macs via the “Designed for iPad” compatibility layer, which means zero extra work on my end.
I use it every day. I’ll draft a long Claude prompt on the walk to my desk, open the app, say it, and paste it into the terminal. Fast enough that it doesn’t break the flow.
What I’d do differently
Skip Mac Catalyst configuration. The automatic Mac compatibility through “Designed for iPad” covers everything I wanted. I spent time setting up Catalyst entitlements, sandbox configs, and Xcode targets that turned out to be unnecessary.
Take screenshots during development. I treated them as a final step and got stuck mid-submission setting up a simulator at the right resolution. They could have been done any time.
Budget a full day for App Store paperwork. The code was done in two days. Getting the privacy manifest, encryption declaration, permission strings, screenshots, and privacy policy all correct and in place took another full day. It’s not hard, just time-consuming, and you can’t skip it.
Share this post
Stay in the loop
Get notified when I publish new posts. No spam, unsubscribe anytime.
Built as part of
View the project →