← All features

CONTEXT MODE

Screen-Aware Dictation for Mac: Reply to What You See by Voice

Most dictation hears your words but is blind to your screen, so you end up narrating context it should already see. Verba's Context mode (Fn+X) takes a screenshot, reads it with a vision model you control, and writes from what is actually in front of you. Say "reply to this email" or "summarize this thread" and Verba does it from the pixels, then pastes into the cursor.

Screen-aware dictation that reads your Mac, not just your voice

Context mode is the only Verba mode that looks before it writes. When you trigger it, Verba captures your current screen, sends that image to a vision-capable model on the AI engine you chose, and uses what it sees as the source material for your spoken instruction. That means you can stop describing the email, the error, or the table on screen and just point at it with your voice. The result is pasted wherever your cursor sits, in any app.

  • Captures the screen, then writes from it with a vision model
  • Trigger with Fn+X, like every other Verba mode
  • Say the instruction, not the context: "reply to this", "explain this error"
  • Pastes the finished text into whatever app holds the cursor
  • Runs on your own AI: Claude subscription, Anthropic key, OpenRouter, or local Ollama vision model

Reply to what is on screen by voice

The everyday win is replying to things you can see but would rather not retype. Open an email and say "reply, politely decline, suggest next week instead" and Context mode reads the message, drafts the reply in your voice, and drops it in the compose box. Point it at a Slack thread, a support ticket, a code review comment, or a form, and it writes from the real content on screen instead of a vague mental summary you have to dictate first.

  • Email replies that already know what the sender wrote
  • Answers to a Slack or chat thread you have open
  • Filling a form or field based on the data shown around it
  • Explaining or rewriting an error, log, or stack trace on screen
  • Summarizing a long article or document you are looking at

What Context mode needs to see your screen

Because it reads pixels, Context mode has two requirements the verbatim modes do not. First, macOS Screen Recording permission, which Verba requests so it can capture the screenshot it sends to the model. Second, a vision-capable model on your AI engine, since a text-only model cannot read an image. Both Claude (via your subscription or Anthropic key) and many OpenRouter and Ollama models support vision; pick one of those for Context. Once granted, the screenshot goes only to the engine you configured for rewriting.

  • Requires macOS Screen Recording permission to capture the screen
  • Requires a vision-capable model on your chosen AI engine
  • Works with Claude, Anthropic, OpenRouter, or a local vision model
  • Bring-your-own-AI: Verba never makes a billed call on your behalf

How Context mode differs from Verba's other modes

Flow and Polish only transcribe and clean what you say. Intent writes to an instruction but knows nothing about your screen. Translate converts language, and JARVIS (Fn+X's cousin, Action mode) takes real actions across connected apps after you confirm. Context sits between dictation and agent: it does not act on your behalf or touch other apps, it simply reads the screen and writes from it. If you want screen-aware writing without an agent doing anything, Context is the mode. Switch to it with Fn+1..9 like any other, and layer a style with Fn+]/[ if you want a particular tone.

Questions, answered

What is Context mode in Verba?+

Context mode is Verba's screen-aware dictation mode for Mac. When you trigger it with Fn+X, Verba screenshots your screen, reads it with a vision model on the AI engine you control, and writes from what it sees, then pastes the result into your cursor.

Can Verba reply to an email by voice based on what is on screen?+

Yes. With Context mode active, open the email and say something like "reply and politely decline." Verba reads the message on screen with a vision model and drafts the reply for you, pasting it into the compose field. You never have to dictate what the email said.

What permissions does Context mode need on macOS?+

Context mode needs macOS Screen Recording permission so Verba can capture the screenshot it sends to the model, plus a vision-capable model on your chosen AI engine. Without a vision model, a text-only engine cannot read the image.

Does my screen get sent to the cloud in Context mode?+

The screenshot is sent only to the AI engine you configured for rewriting, which can be your Claude subscription, your Anthropic key, OpenRouter, or a fully local Ollama vision model. If you pick a local model, the image never leaves your Mac. Verba is bring-your-own-AI and never makes a billed call for you.

How is Context mode different from JARVIS?+

Context mode reads your screen and writes text from it, but it does not take actions in other apps. JARVIS (Action mode) plans and, after you confirm, acts across 1,000+ connected apps. Use Context for screen-aware writing and JARVIS when you want an agent to actually do something.