← All essays

How Babulfish Escaped My Site

In the first version, I just wanted live translation on bigH.github.io. I had no grand library plan. I wanted a globe button, a big irresponsible model download warning, and a way to make my site speak Spanish without paying for server inference.

That version worked. It also had the exact shape of a one-off feature.

The code knew about my sidebar. It knew about my essay markdown wrappers. It knew which company names should never be translated. It knew how I wanted section headers to pulse when translation landed. It knew a bunch of very specific things that were perfectly reasonable inside one app and pretty suspect inside a public package.

That is the part people tend to skip when they say they are "extracting a library." They take the code that currently works, move it into packages/whatever, and call it reusable. Usually that just means the application has been shoved into a trench coat.

I did not want that.

The real job was separating "this is the translation product" from "this is just how my site happens to be built."

Some pieces were obviously general: loading the model, tracking download progress, choosing WebGPU vs WASM, exposing a sane state machine, walking DOM roots, translating in phases, restoring the original content, aborting in-flight work, translating repeated labels once and fanning the result out to every matching node.

Some pieces were obviously site-shaped: my exact selectors, the pulse and settle animation classes, the button styling, the punctuation cleanup I wanted after translation, and the fact that my essays keep original inline markdown around in data-md.

That split ended up becoming babulfish.

The package structure is intentionally boring:

  1. @babulfish/core owns the engine, the DOM translator, and the public contract.
  2. @babulfish/react is a thin React binding over the same core.
  3. @babulfish/styles carries the CSS contract.
  4. babulfish exists as a permanent unscoped alias because package naming on the internet is still a little stupid.

The important part is not the package count. It is the boundary.

@babulfish/core had to be useful without knowing anything about my website. If a React app, a plain DOM app, and a custom element could all drive the same engine, then I had probably found a real seam. If they could not, I was still just reorganizing my personal site.

The hardest part was still the DOM.

Model loading is a solved enough problem now. Not trivial, but legible. The nasty part is translating real page content without wrecking it.

A normal page is not one big string. It is text nodes, links, bold spans, code, headings, tooltips, repeated labels, and whatever little crimes your renderer committed on the way to the browser. If you translate raw text nodes one by one, you get nonsense. I already learned that lesson in the site version. A sentence split across inline tags turns into soup fast.

So the extraction had to preserve the things that made the original feature decent. linkedBy handles cases like data-section-title, where the same label appears in more than one place and should translate identically. richText handles authored source strings like data-md, where the right thing to translate is the original markdown, not the rendered fragments. structuredText handles inline-rich prose that lives directly in the DOM and still needs to be treated as one logical unit. Hooks and output transforms give the library seams without hard-coding my aesthetic preferences into everyone else's app.

That last point mattered a lot. I wanted the package to make the site possible, not to fossilize the site as the only blessed way to use it.

The funny thing is that once you do that work, the original app gets simpler almost immediately.

The current site-side translation setup is mostly configuration:

export const SITE_TRANSLATOR_CONFIG = {
  engine: {
    device: "webgpu",
  },
  dom: {
    roots: ["[data-translate-root='sidebar']", "main"],
    phases: ["[data-translate-root='sidebar']", "main h2", "main"],
    linkedBy: {
      selector: "[data-section-title]",
      keyAttribute: "data-section-title",
    },
    richText: {
      selector: "[data-md]",
      sourceAttribute: "data-md",
      render: renderInlineMarkdownToHtml,
    },
    structuredText: {
      selector: "p, li, figcaption, h1, h2, h3, h4, h5, h6",
    },
    translateAttributes: ["title"],
    outputTransform: normalizeDomOutput,
  },
}

That is a much healthier shape.

The site still gets to keep its own taste. It can pin translation to WebGPU. It can preserve specific names. It can animate translated elements. It can keep a custom button. But the engine and DOM machinery are no longer welded to bigH.github.io.

I also did not want the extraction to be "technically reusable" but only inside the one app that birthed it. That is fake portability. So the repo has three proof points now: a React demo, a zero-framework DOM demo, and a web component demo with isolated Shadow roots sharing one engine.

If the same core cannot survive those three contexts, it is not a library. It is a hostage situation.

The test story got the same treatment. I wanted shared conformance tests, not just "well it seems to work in the demo." That way the React binding is forced to behave like the core contract, and future bindings have something real to prove against instead of vague vibes and screenshots.

That was probably the biggest shift in mindset.

The first site version was feature work. The library version was contract work.

Feature work asks, "Can I make this page translate?" Contract work asks, "What behavior is actually stable enough to promise to other people?"

Those are different questions, and they produce different code.

Contract work is stricter. You have to decide what is truly supported, what is intentionally out of scope, and which ugly little heuristics belong in app config instead of the shared package. You have to stop yourself from sneaking private assumptions into public APIs just because it would be convenient for one app. You have to write the demos that prove the abstraction is real. Annoying, but healthy.

I came away liking the extraction more than I expected.

Partly because I now have a cleaner site. Partly because I have a reusable library I can point at new experiments. But mostly because it confirmed something I keep running into: the best library ideas usually start as specific, slightly messy app code that had to earn its keep before it got cleaned up.

That is a much better starting point than inventing an abstraction in a vacuum and hoping reality eventually agrees with it.

So that is babulfish.

It started as "wouldn't it be funny if my site translated itself in the browser?" Then it became a globe button, a DOM translator, and a pile of lessons about inline content. Then it turned into a package set with an actual contract, shared tests, and multiple integration surfaces.

Much better than leaving it as one more clever thing trapped in one repo.

If you want to try it, start with @babulfish/react on npm. If you want the lower-level bits or just want to read the code, the whole thing is open source at github.com/bigH/babulfish.