โ† Blog
|tutorial|By Daniel Riazanovskiy|

CodeSpeak Quick Start: Work in an Existing Project

โš ๏ธ CodeSpeak is in Alpha Preview: many things are rough around the edges. Please use at your own risk and report any issues to our Discord. Thank you!

Let's use CodeSpeak to add a feature to an existing project. We call this mixed mode โ€” only part of the codebase is controlled by CodeSpeak.

Prerequisites

Install uv

CodeSpeak uses uv as its Python package manager.

curl -LsSf https://astral.sh/uv/install.sh | sh

Restart your terminal (or run source ~/.bashrc / source ~/.zshrc), then verify:

uv --version

Get an Anthropic API key

CodeSpeak is BYOK (Bring Your Own Key). Get an API key at platform.claude.com/settings/keys.

You can provide the key in two ways:

  • Paste it when CodeSpeak prompts you (this creates an .env.local file in your project directory)
  • Set the environment variable: export ANTHROPIC_API_KEY=<your-key>

Install CodeSpeak

uv tool install codespeak-cli

Verify the installation:

codespeak --version

Log in

codespeak login

Log in with Google or email/password.

Clone the repo

We'll add EML support to MarkItDown, Microsoft's document-to-markdown converter.

git clone git@github.com:microsoft/markitdown.git
cd markitdown

Setup the project

Following MarkItDown README, let's set up a venv to make sure it works.

uv venv --python=3.12 .venv
source .venv/bin/activate

You can verify the tests pass with

pushd packages/markitdown
uv pip install hatch
hatch test
popd

This should produce some output like

================================== test session starts ===================================
<...>
collected 196 items

tests/test_cli_misc.py ..                                                                               [  1%]
tests/test_cli_vectors.py ..................................................                            [ 26%]
<...>

======================= 194 passed, 2 skipped in 94.08s (0:01:34) ========================

You can also verify markitdown itself works by converting one of the existing test files:

uv pip install -e 'packages/markitdown[all]'
markitdown packages/markitdown/tests/test_files/test_with_comment.docx

Initialise CodeSpeak in mixed mode

codespeak init --mixed
# Initialized CodeSpeak project in mixed mode

This creates a codespeak.json at the repo root. Mixed mode means CodeSpeak manages only the files you specify โ€” the rest of the codebase stays untouched.

Optionally, create an AGENTS.md file to help CodeSpeak's agents navigate the project faster:

AGENTS.md
A virtual environment is pre-configured at the project root (`.venv/`). Hatch is installed there.

# Running Tests

From `packages/markitdown/`, run `GITHUB_ACTIONS=1 hatch test`. Skipping remote URL testing is necessary for any new work.

The full test suite takes several minutes.

# Adding Tests

The primary testing mechanism is the **test vector framework**:

1. Add test fixture files to `tests/test_files/`
2. Add `FileTestVector` entries to `tests/_test_vectors.py`

The parametrized tests in `test_module_vectors.py` will automatically exercise your converter through all standard code paths.

Configure and add a spec

In order to add our new feature, let's create packages/markitdown/src/markitdown/converters/eml_converter.cs.md โ€” right next to the existing converters:

eml_converter.cs.md
# EmlConverter

Converts RFC 5322 email files (.eml) to Markdown using Python's built-in `email` module.

## Accepts

`.eml` extension or `message/rfc822` MIME type.

## Output Structure

1. **Headers section**: From, To, Cc, Subject, Date as `**Key:** value` pairs
2. **Body**: plain text preferred; if only HTML, convert to markdown
3. **Attachments section** (if any): list with filename, MIME type, human-readable size

## Parsing Requirements

- Decode RFC 2047 encoded headers (e.g., `=?UTF-8?B?...?=`)
- Decode body content (base64, quoted-printable)
- Handle multipart: walk parts, prefer `text/plain` over `text/html`
- For `message/rfc822` parts: recursively format as quoted nested message
- Extract attachment metadata without decoding attachment content

Register this spec in codespeak.json:

codespeak.json
"specs": [
  "packages/markitdown/src/markitdown/converters/eml_converter.cs.md"
]

In mixed mode, CodeSpeak won't touch existing project files by default โ€” it only creates new ones. But our new converter needs to be wired into MarkItDown's plugin system: imported in __init__.py and registered in _markitdown.py. We explicitly allow this by adding the following files to whitelisted_files in codespeak.json:

codespeak.json
"whitelisted_files": [
  "packages/markitdown/src/markitdown/converters/__init__.py",
  "packages/markitdown/src/markitdown/_markitdown.py",
  "packages/markitdown/tests/_test_vectors.py"
]

Build

Complex mixed-mode projects work best with Claude Opus 4.6. Set the model with an environment variable and start the build:

CODESPEAK_ANTHROPIC_STANDARD_MODEL=claude-opus-4-6 codespeak build

On the first run, you'll be prompted to log in and add your API key. After that, CodeSpeak will execute the build. This can take some time:

Connecting to build.codespeak.dev:50053...
Remote build started (ID: 079e2794-b84f-41cf-8cbb-77c692b845d5)

   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•—
  โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘ โ–ˆโ–ˆโ•”โ•
  โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•
  โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•  โ•šโ•โ•โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•โ• โ–ˆโ–ˆโ•”โ•โ•โ•  โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•—
  โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•—
   โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•     โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•  โ•šโ•โ•โ•šโ•โ•  โ•šโ•โ•

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ CodeSpeak Progress โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ โœ“ Process specification (0.3s)                                             โ”‚
โ”‚ โœ“ Collect project information (0.1s)                                       โ”‚
โ”‚ โœ“ Implement specification (1m 33s)                                         โ”‚
โ”‚ โ•ฐโ”€ โœ“ Collect context & plan work (1m 33s)                                  โ”‚
โ”‚ โœ“ Generate and run tests in mixed mode (15m 46s)                           โ”‚
โ”‚ โ•ฐโ”€ โœ“ Run existing tests to ensure they pass (1m 12s)                       โ”‚
โ”‚ โ•ฐโ”€ โœ“ Create test EML files for different scenarios (1m 0s)                 โ”‚
โ”‚ โ•ฐโ”€ โœ“ Write focused unit tests for core EML functionality (6m 12s)          โ”‚
โ”‚ โ•ฐโ”€ โœ“ ...                                                                   โ”‚
โ”‚ โ•ฐโ”€ โœ“ Identify and fix issues in the nested message handling (2m 18s)       โ”‚
โ”‚ โœ“ Finalize mixed mode run (0.1s)                                           โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
Processing spec 1/1: packages/markitdown/src/markitdown/converters/eml_converter.cs.md
App built successfully.

Inspect the results

Now you can inspect the newly generated files:

$ git status

Changes not staged for commit:
        modified:   packages/markitdown/src/markitdown/_markitdown.py
        modified:   packages/markitdown/src/markitdown/converters/__init__.py
        modified:   packages/markitdown/tests/_test_vectors.py

Untracked files:
        packages/markitdown/src/markitdown/converters/_eml_converter.py
        packages/markitdown/tests/test_files/test_email.eml
        packages/markitdown/tests/test_files/test_email_html_only.eml
        packages/markitdown/tests/test_files/test_email_nested.eml

CodeSpeak created _eml_converter.py, wired it into the three whitelisted files, and generated sample .eml fixtures.

Run tests

pushd packages/markitdown
GITHUB_ACTIONS=1 hatch test
popd
platform linux -- Python 3.14.2, pytest-9.0.2
collected 229 items

tests/test_cli_misc.py ..                                          [  0%]
tests/test_cli_vectors.py .......................sssssssssssssss..  [ 27%]
<...>
tests/test_pdf_tables.py ...............                           [100%]

192 passed, 37 skipped in 47.65s

Try it out

CodeSpeak generated test .eml files during the build. Try the new converter on one:

markitdown packages/markitdown/tests/test_files/test_email.eml

See Also