2.0 brings downloaded local models to the Mac

Speak anywhere. Choose local, live, or both.

Native dictation for Mac and iOS with downloaded WhisperKit batch models, sherpa-onnx local streaming, local LLM post-processing, and cloud engines like Deepgram, AssemblyAI, OpenAI, ElevenLabs, Soniox, and Modulate when you want maximum accuracy.

⬇️ Download for Mac 📱 Join iOS TestFlight Star on GitHub
$0
App download
Local
Batch, streaming & cleanup
BYOK
No price markup
Local + Live

Choose the engine for the moment

Stay offline with local models, stream for instant feedback, or use a cloud model for specialised accuracy. JustSpeakToIt keeps each path explicit so you always know where audio and text are processed.

Deepgram ~200ms

Nova-3 Streaming

Fast WebSocket transcription with interim results for the lowest-friction everyday dictation flow.

AssemblyAI ~250ms

Universal-3 Pro Streaming

High-accuracy multilingual streaming with formatted turn finalisation when you want cleaner long-form output.

OpenAI ~250ms

Realtime Whisper

Streaming transcription with built-in noise reduction using your existing OpenAI API key.

ElevenLabs ~200ms

Scribe v2 Streaming

Real-time Scribe transcription for teams already using ElevenLabs speech workflows.

Soniox ~220ms

Real-time Preview

Low-latency multilingual WebSocket STT for fast feedback across more languages.

Modulate ~220ms

Velma-2 Streaming

Multilingual live transcription with diarization and signal-detection oriented provider support.

Apple ~50ms

On-device Speech

Private local recognition for instant, free dictation when cloud accuracy is not required.

Local Model Lab

Downloaded models, no round trip required

Version 2.0 separates local batch, local streaming, and local post-processing so you can pick the right offline tool without guessing which runtime is active.

Batch WhisperKit/Core ML after recording stops
Live sherpa-onnx streaming sources with model sizes shown
Polish downloaded GGUF LLMs for private cleanup prompts
Cloud optional BYOK providers when accuracy beats locality
Offline transcription

Batch models for careful work

Download WhisperKit/Core ML models, record normally, then transcribe locally after capture for private drafts, notes, and sensitive dictation.

Local streaming

Live preview without cloud audio

Use sherpa-onnx streaming models for local live text. The settings UI now shows catalogue/source sections, model sizes, and runtime state consistently.

Private cleanup

Prompt-aware local post-processing

Download local LLMs for cleanup and formatting. Small models can be less obedient, so the app makes the prompt path visible and keeps built-in rules honest.

Preview

See it in action

JustSpeakToIt Dashboard

Dashboard

Monitor usage, check permissions, and start recording in one click.

Transcription History

History

Browse transcriptions with audio playback, cost tracking, and session details.

Features

Built around live words, not waiting

A native app that feels immediate. No Electron, no bloat — just fast streaming transcripts, safe key storage, and polished text insertion.

Streaming-First Dictation

Double-tap Fn, hold-to-record, or use a custom shortcut. Watch live text arrive as you speak, then insert the final transcript where your cursor is.

🔒

Downloaded Local Models

Keep sensitive recordings offline with Apple Speech, WhisperKit batch models, sherpa-onnx streaming, and local LLM cleanup.

🎯

Smart Text Insertion

Transcripts are automatically typed into any app using accessibility APIs. Insert at cursor or replace selection—works with any text field.

🧠

AI Post-Processing

Optional Live Polish and post-processing using downloaded local LLMs or GPT-4o, Claude, Gemini, and Mistral--removes filler words, fixes punctuation, and polishes your prose.

💬

OpenClaw AI Chat

Have voice-powered conversations with AI. Speak your questions and hear responses read back—complete with conversation history.

🎙️

Hands-Free Mode

Full voice-to-voice conversation loop: record, send, listen to the response, and auto-resume recording. No tapping required.

🔊

Text-to-Speech

Hear responses with natural voices from OpenAI, ElevenLabs, Deepgram, Azure, or Apple's built-in system voices.

📚

Personal Lexicon

Teach the app your jargon, names, and acronyms. Your custom dictionary improves accuracy across all providers.

☁️

iCloud History Sync

Your transcription history syncs seamlessly between Mac and iOS via iCloud. Pick up right where you left off on any device.

How It Works

Three steps to live text

Activate

Double-tap Fn or click the menu bar icon. A subtle HUD appears with your active live model.

Speak

Just talk naturally. Streaming providers return interim text while you are still speaking.

Done

Stop recording and your polished text appears instantly in any app, right at your cursor.

Local first. Bring keys only when you need them.

Download local models for offline work, then use your existing API keys from supported live transcription providers when cloud accuracy, diarisation, or specialist languages matter. Pay only for what you use--typically $0.36 per hour or less with low-cost streaming providers. No subscriptions, no hidden fees, no data selling.

Platforms

Native apps, native performance

Built with SwiftUI for the best experience on every Apple device.

💻

macOS

  • Menu bar app with global hotkeys
  • Direct text insertion via Accessibility
  • Live transcription preview HUD
  • Downloaded WhisperKit batch models and sherpa-onnx local streaming
  • Downloaded local LLM post-processing for private cleanup
  • Deepgram, AssemblyAI, OpenAI, ElevenLabs, Soniox, Modulate, and Apple live models
  • Live Polish + AI post-processing
  • OpenClaw AI chat with hands-free mode
  • Install via Homebrew or direct download
  • Requires macOS 14+
📱

iOS (TestFlight Beta)

  • Live Activity with Dynamic Island
  • Apple Speech, Deepgram, ElevenLabs, and OpenAI live transcription
  • OpenClaw AI chat with voice input
  • Audio recording & playback
  • Home Screen widgets
  • iCloud history sync with Mac
  • Requires iOS 17+
FAQ

Questions? We've got answers.

Is my audio data private? +

Yes! When using on-device transcription with Apple Speech, your audio never leaves your device. When using downloaded local models or Apple Speech, audio stays on your device. When using cloud providers like Deepgram, AssemblyAI, Soniox, ElevenLabs, OpenAI, or Modulate, audio is sent directly to their APIs using your own API keys—we never see or store your audio.

How much does it actually cost? +

The app is free to download. On-device transcription and downloaded local models are free to run after installation. Cloud transcription with your own API keys typically costs around $0.006/minute (about $0.36/hour) with low-cost streaming providers such as Deepgram. You pay the provider directly—we don't add any markup.

What's the difference between on-device and cloud transcription? +

Local modes include Apple Speech, downloaded WhisperKit batch models, sherpa-onnx local streaming, and downloaded local LLM cleanup. They prioritise privacy and offline use. Cloud providers like Deepgram, AssemblyAI, Soniox, ElevenLabs, OpenAI, and Modulate offer higher accuracy, lower-latency streaming, broader language support, and provider-specific features like diarisation or formatted turns.

Do small local models follow custom formatting prompts? +

Larger local LLMs tend to follow cleanup instructions better. Very small downloaded models are useful for private, lightweight cleanup, but they may ignore strict formatting requests. The app shows the prompt path so you can tell whether the issue is model capability or configuration.

Can I use this for meetings? +

On Mac, you can capture system audio from meeting apps like Zoom or Teams. On iOS, the app captures audio from your device's microphone—place your phone near the speaker for best results, or use the hands-free conversation mode to interact with AI directly.

How do I set up my API keys? +

Open the app's Settings and enter your API keys in the Providers section. Keys are stored securely in your Mac's Keychain. We never see or transmit your keys—they go directly to the provider's API.

What is OpenClaw? +

OpenClaw is the built-in AI chat feature. You can have voice-powered conversations with AI assistants— speak your questions, get spoken responses, and use hands-free mode for a completely voice-driven experience. Conversations are saved locally and can sync via iCloud.

Ready to speak freely?

Download JustSpeakToIt 2.0 and pick the transcription path that fits the moment: offline local, instant live, or cloud accuracy.

⬇️ Download for Mac 📱 Join iOS TestFlight

Have a feature request, bug report, or want to contribute? Open an issue on GitHub — we'd love to hear from you!