← All essays

Live Translation, Right in the Browser

I started with a pretty simple question: can I run one of these models in the browser?

The motivation was not lofty. I wanted AI features on my site. I just did not want to pay for it with my compute. If the reader already has a GPU, then they should be able to run a model.

My friendly clanker pointed me toward ONNX and Transformers.js. That turned out to be the right trail almost immediately. The browser-side story is much better than I expected. There is real tooling here now, not just demos held together with wishful thinking.

The first phase was exactly what it should have been: kick the tires on a random model and see if anything useful happens. A lot of language models will technically run, but many of the smaller ones don't do anything interesting. The useful ones get big fast, and big fast matters when the download is happening inside a browser tab.

Then I started digging through ONNX models and eventually found TranslateGemma in ONNX form. I thought it'd be a cool feature to add to my personal site. Also, credit where it's due: Google has a surprisingly good spread of open models right now, including a bunch of use-case-specific ones that feel much more practical than another generic "do everything" model. If you want to browse around, Gemmaverse and Google Cloud's docs are worth a look. Thanks, Google.

It was really easy to start:

import { pipeline } from "@huggingface/transformers"
 
const generator = await pipeline(
  "text-generation",
  "onnx-community/translategemma-text-4b-it-ONNX",
  {
    dtype: "q4",
    device: "webgpu",
  },
)

That was the moment where the idea stopped feeling speculative. The browser could, in fact, load the model and run it on WebGPU.

The next step was feeding it the right shape of input:

const messages = [
  {
    role: "user" as const,
    content: [
      {
        type: "text",
        source_lang_code: "en",
        target_lang_code: "es-ES",
        text,
      },
    ],
  },
]
 
const result = await generator(messages, { max_new_tokens: 512 })

Once that worked, getting from "a model can translate a string" to "this page has translation" was mostly an exercise in not wrecking the DOM.

The first version was naive on purpose. Walk the text nodes inside <main>, skip obvious bad targets like <code> and <pre>, batch a few strings together, translate them, and write the results back in place. That was enough to prove the feature.

It was not enough to make it good.

The annoying part was formatting. Translating raw text nodes works fine until you hit inline styling. If a sentence is split across text nodes because one phrase is bold, a naive walker will happily translate the pieces separately and turn the whole sentence into soup.

The fix was to keep the source markdown around for the parts of the page that were authored as markdown, and translate the full markdown string instead of the rendered fragments. In the app I wrapped those pieces with a data-md marker so the translation pass could treat them as a single unit:

<span data-md={source}>
  <Block content={source} components={components} />
</span>

That changed the system from "translate whatever text nodes happen to exist" to "translate the thing the author actually wrote." Much better.

I also had to preserve a few proper nouns. Company names, my own name, that kind of thing. The cheap and effective approach was placeholder substitution:

const { masked, slots } = insertPlaceholders(source)
const rawTranslation = await translate(masked, targetLang)
const translated = restorePlaceholders(rawTranslation, slots)

There was a similar lesson with section headers. The sidebar nav and the section title on the page might both say "Experience", but they should not be translated independently and come back slightly different. Also, I thought it would be cool to have them change simultaneously. So I added a shared data-section-title key and translated those once, then fanned the result out to every matching node.

By that point the hard problem was no longer "can the model run?" It was UX.

The model is about a 2.9 GB download. That is not something you want people to be downloading without consent. So the button became a small sequence instead of a single action: explain what it does, warn about the download, confirm intent, show download progress, then show that translation is ready.

That part happened in a bunch of small steps, with my agent doing the boring part of the climb. Kick the tires on any model. Get translation working. Tighten the DOM handling. Add progress. Add restore. Make the button better. Make the translation update components as they're translated. Add some visual indication of translation in progress and "stream" the text in when it's done (see Streamdown).

That last part mattered more than I expected.

The page now translates in a rough order that matches how people read it: shared section titles first, then the higher-level structure, then the rest of the body. While that happens, the UI pulses and settles as each chunk completes. None of that changes the underlying capability, but it changes the feel of the feature from "experimental widget" to "this thing knows what it is doing."

I came away from this with two strong impressions.

First, Hugging Face is pretty cool. The combination of model hosting, docs, and browser tooling made this much easier than I expected. They are doing good work.

Second, this was exactly the sort of project where having an agent around was genuinely useful. Not because it invented the idea, and not because it wrote some magical finished system in one shot. It helped compress the boring middle. I could go from "I wonder if this is possible" to "here is a shippable translation feature" and then spend my time iterating on the parts that actually benefit from taste.

That is probably my favorite part of the whole thing. The path was not one giant leap. It was a series of pretty normal steps:

  1. Try running any small model.
  2. Find a model that is actually worth running.
  3. Make it work in the browser.
  4. Make it work on the page.
  5. Make it not break formatting.
  6. Make it feel good.

Nothing mystical. Just a decent model, a good browser runtime, and enough iteration to sand off the dumb parts.

Give it a try by clicking the globe icon on the top right!