CONTEXT MODE

Screen-Aware Dictation for Mac: Reply to What You See by Voice

Most dictation hears your words but is blind to your screen, so you end up narrating context it should already see. Verba's Context mode (Fn+X) takes a screenshot, reads it with a vision model you control, and writes from what is actually in front of you. Say "reply to this email" or "summarize this thread" and Verba does it from the pixels, then pastes into the cursor.

Download for macOS Pricing

Screen-aware dictation that reads your Mac, not just your voice

Context mode is the only Verba mode that looks before it writes. When you trigger it, Verba captures your current screen, sends that image to a vision-capable model on the AI engine you chose, and uses what it sees as the source material for your spoken instruction. That means you can stop describing the email, the error, or the table on screen and just point at it with your voice. The result is pasted wherever your cursor sits, in any app.

Captures the screen, then writes from it with a vision model
Trigger with Fn+X, like every other Verba mode
Say the instruction, not the context: "reply to this", "explain this error"
Pastes the finished text into whatever app holds the cursor
Runs on your own AI: Claude subscription, your own API key, or a fully local vision model

Reply to what is on screen by voice

The everyday win is replying to things you can see but would rather not retype. Open an email and say "reply, politely decline, suggest next week instead" and Context mode reads the message, drafts the reply in your voice, and drops it in the compose box. Point it at a Slack thread, a support ticket, a code review comment, or a form, and it writes from the real content on screen instead of a vague mental summary you have to dictate first.

Email replies that already know what the sender wrote
Answers to a Slack or chat thread you have open
Filling a form or field based on the data shown around it
Explaining or rewriting an error, log, or stack trace on screen
Summarizing a long article or document you are looking at

What Context mode needs to see your screen

Because it reads pixels, Context mode has two requirements the verbatim modes do not. First, macOS Screen Recording permission, which Verba requests so it can capture the screenshot it sends to the model. Second, a vision-capable model on your AI engine, since a text-only model cannot read an image. Both Claude (via your subscription or your own API key) and many other cloud and local models support vision; pick one of those for Context. Once granted, the screenshot goes only to the engine you configured for rewriting.

Requires macOS Screen Recording permission to capture the screen
Requires a vision-capable model on your chosen AI engine
Works with Claude, your own API key, or a fully local vision model
Bring-your-own-AI: Verba never makes a billed call on your behalf

How Context mode differs from Verba's other modes

Raw and Polish only transcribe and clean what you say. Intent writes to an instruction but knows nothing about your screen. Translate converts language, and JARVIS (Fn+X's cousin, Action mode) takes real actions across connected apps after you confirm. Context sits between dictation and agent: it does not act on your behalf or touch other apps, it simply reads the screen and writes from it. If you want screen-aware writing without an agent doing anything, Context is the mode. Switch to it with Fn+1..9 like any other, and layer a style with Fn+]/[ if you want a particular tone.

Questions, answered

What is Context mode in Verba?+

Context mode is Verba's screen-aware dictation mode for Mac. When you trigger it with Fn+X, Verba screenshots your screen, reads it with a vision model on the AI engine you control, and writes from what it sees, then pastes the result into your cursor.

Can Verba reply to an email by voice based on what is on screen?+

Yes. With Context mode active, open the email and say something like "reply and politely decline." Verba reads the message on screen with a vision model and drafts the reply for you, pasting it into the compose field. You never have to dictate what the email said.

What permissions does Context mode need on macOS?+

Context mode needs macOS Screen Recording permission so Verba can capture the screenshot it sends to the model, plus a vision-capable model on your chosen AI engine. Without a vision model, a text-only engine cannot read the image.

Does my screen get sent to the cloud in Context mode?+

The screenshot is sent only to the AI engine you configured for rewriting, which can be your Claude subscription, your own API key, or a fully local vision model. If you pick a local model, the image never leaves your Mac. Verba is bring-your-own-AI and never makes a billed call for you.

How is Context mode different from JARVIS?+

Context mode reads your screen and writes text from it, but it does not take actions in other apps. JARVIS (Action mode) plans and, after you confirm, acts across 1,000+ connected apps. Use Context for screen-aware writing and JARVIS when you want an agent to actually do something.

More features

JARVIS

JARVIS is a voice agent for Mac that acts across 1,000+ apps like Gmail, Slack, and Notion. It plans, shows the steps, and only acts after you confirm.

Modes

Verba's AI dictation modes turn voice into clean text on your Mac: Raw, Polish, Intent, Prompt, plus custom modes and styles. Switch with Fn+1..9. BYO AI.

Voice Notes

Verba is a voice notes app for Mac. Dictate notes up to an hour, get a clean structured document, nest #tags, lock with a password, export .md or .txt.

Voice Tasks

Add tasks by voice on your Mac. Speak one request and Verba builds a project, tasks, and sub-tasks. Nested tags, an ⌥+Fn glance, fully BYO AI.

Live Translation

Dictate in another language on your Mac. Speak any of ~99 languages, Verba writes your target language and pastes it into any app. Pick a target once, talk.