Skip to main content

Voice Transcription

Coeus can transcribe audio: either from your microphone in real time or from an audio file you import.

Recording your voice

Click the mic button in the chat bar. Coeus starts recording from your microphone. Talk. Click the button again to stop.

The transcript shows up as your message. You can edit it before sending, or just send it as-is.

If you're dictating a note rather than asking a question, you can tell the AI: "Save this as a note." Or just start your message with the note's title.

Importing an audio file

Drop an audio file onto the Coeus window or use the import button (paperclip) to select one. Coeus transcribes it and creates a note with the transcript.

Supported formats: MP3, WAV, M4A, MP4, MOV, OGG, WebM, FLAC, AAC.

Good for recorded meetings, voice memos, or podcast episodes you want to take notes on.

YouTube transcription

Paste a YouTube URL into the chat bar. Coeus detects it and shows a banner asking what you want to do. Click Transcribe and Coeus downloads the audio and transcribes it.

The transcript is saved as a note you can search and ask questions about.

Transcription modes

There are two ways Coeus can transcribe audio. You pick one in Settings → Integrations → Speech & Transcription.

Local Whisper (default)

Coeus runs Whisper on your machine. Nothing gets sent to any server.

There are four model sizes:

ModelSizeSpeedAccuracy
tiny.en75 MBFastestLower
base.en142 MBFastGood
small.en466 MBMediumBetter
medium.en1.5 GBSlowerBest

Start with base.en. It's accurate enough for most speech and downloads quickly.

To download a model: Settings → Integrations → Speech & Transcription → Download model.

OpenAI Transcription API

Sends your audio to OpenAI's gpt-4o-mini-transcribe model. More accurate than local Whisper for some accents and noisy audio. Costs a small amount per minute.

To use it, enter your OpenAI API key in the Speech settings. Audio files are sent to OpenAI and the transcript is returned. The audio is not stored.

Transcription quality tips

  • Speak clearly and at a normal pace
  • A quiet environment helps a lot with local Whisper
  • The small.en or medium.en models handle accents and background noise better than tiny.en
  • If you're transcribing long recordings, expect it to take a little while with local models