Translating the Tutorial (and, maybe, Docs)

It came to my mind a “subdirectory” approach could have a negative impact on SEO, compared to a “subdomain” approach like React did. In the few articles I read about this, both are acceptable.

The golden rule is apparently to not mix languages on the same page, with a special exception for Japanese.

Then, be SEO compliant would imply to :

  • avoid partially translated pages
  • adapt the navigation menu ?

Interesting topic. I think it comes down to the What/Why of the translations.

If the goal is to make documentation/tutorials accessible to speakers of different languages - maybe that suggests the ‘docs’ are an entity of their own that can be picked up by SEO (this would require redirects from current /docs…)

https://docs.gitlocalize.com/about.html <-- ironically not localized
https://api.rubyonrails.org/

If the goal is to have all content on the site localized, then maybe language subdomains for non-English sites makes sense
https://ja.reactjs.org/

Lastly there’s always the subfolder route - which seems to also work fine with SEO, but might bloat the main project size.
https://developer.mozilla.org/ja/docs/Web/API/Fetch_API

I’d be especially interested to hear @rob’s thoughts about folder vs subdomain organization on the site.

It seems to me like having a separate repo for every translation will be more maintainable in the long run, and that kinds of lends itself to the subdomain version—each one is its own Netlify site and they live in their own self-contained world. However, I’m far from an i18n expert so I’ll take advice wherever I can get it!

My experience with i18n has been with Rails, where we always used https://localeapp.com to manage translations. Your code contains references to “keys” like t('home.title'): this calls the t method, which is short for translate, and the parameter is the key you want the translation for. It uses the locale value in session (defaulted to en-US) and returns the value for that key in the given locale. So you end up with one single codebase and totally manage the translations somewhere else. And it has the bonus side effect that translators just need access to the LocaleApp website, they don’t have to be developers and familiar with building the application code itself.

Another thing to keep in mind is that I assume we’ll want to setup search for each translation, which will require stand-alone Algolia indexes for each. And if people want to test search they’ll need API keys so they can populate it from the development environment…managing those will be whole task in itself. :frowning:

1 Like

It seems to me like having a separate repo for every translation will be more maintainable in the long run, and that kinds of lends itself to the subdomain version—each one is its own Netlify site and they live in their own self-contained world. However, I’m far from an i18n expert so I’ll take advice wherever I can get it!

What are your thoughts on having the instructional material (docs/tutorials) live in their own docs subdomain (like docs.redwoodjs.com) that is maintained separately and localized?

This would make the more canonical “meat and potatoes” of Redwood more accessible earlier on without the anxiety of keeping up with all the other changing things on the site, like News.

That way we could reap the ease of static site generation from markdown while moving the localization clutter outside the main site. We could still use cameronjs, maybe with a little tweaking for docsify-esque functionality with search. Here is a bilingual example: https://docs.snipaste.com/

My experience with i18n has been with Rails, where we always used https://localeapp.com to manage translations. Your code contains references to “keys” like t('home.title') : this calls the t method, which is short for translate

^These types of translation management systems are perfect for string management in a UI.

As far as I can tell however, the Redwood site content seems to be largely markdown doc based which lends itself to git-diff based management for maintenance.

Gitlocalize is great for that, and is free and shall remain free, as I confirmed with the gitlocalize team. Much like the Rails localeapp, translators/maintainers log in to a dashboard on the gitlocalize site and translate directly there (they do need a github account though). All the content lives in our own repo(s) - githlocalize just provides the centralized status dashboard and good machine aided translation interface. Best of all, it is instantly visible what content has gone stale against the source docs thanks to git diff monitoring.

Thoughts anyone?

1 Like

What are your thoughts on having the instructional material (docs/tutorials) live in their own docs subdomain (like docs.redwoodjs.com ) that is maintained separately and localized?

In this case would there be a separate repo for the docs? So something like:

Eventually people would want to translate stuff like the homepage or roadmap and they’d do that in the main repo, right? So there’d be translations living in both…at that point is there any benefit to splitting the repos? What do you see the directory structure looking like to organize the translations?

It feels “cleaner” to me to keep the site together but just have a copy of the entire repo for each translation, ala React and others, and having each repo exposed as their own subdomain. If something isn’t translated it just stays English until someone gets around to it, like this Support page in the Ukrainian version of the React site: https://uk.reactjs.org/community/support.html

This also means none of the build scripts need to change, they just keep building within their own self-contained universe and publish to their own self-contained search index (although having to make some changes isn’t a deal-breaker by any means). Also the navigation is self-contained, there’s no need to keep locale choices in state somewhere and be sure to serve up the proper version of a page…

Is there a huge downside to this repo-per-locale approach I’m not seeing? I can see the Gitlocalize workflow being helpful for less technical translators, but in our case I don’t imagine many non-technical folks doing translations—if you’re into translating docs for a web framework you’re probably all in! :slight_smile:

1 Like

If we need more data for what translations to focus on first, here’s the last 30 days of analytics for redwoodjs.com:

3 Likes

Woah. You all are amazing. I took a long weekend and came back to a wonderful explosion of activity! :rocket:

Organizing by Repo

I’m going to +1what I feel is the consensus around this approach:

Each translation has its own repository in the organization on GitHub, with designated codeowners to review and approve changes. Each repo will include all pages needing translations (with some prioritized over others), and a bot to notify of changes in the main repo to keep everything up-to-date.

^^ This was from the Gatsby guide that @ajcwebdev posted. And I believe it was reiterated in the quote about Vue setting up translations.

I think we’re a ways off from translating the entire site. For now we’ll just need to have the individual markdown files for the translated docs.

Next step → yes?

@clairefro @Thieffen @ajcwebdev
Can we create a repo redwoodjs/translation-<language> and give you all full access to start prototyping?

Step 1: Content and Language

I’d really like to encourage us to focus on:

  1. tutorial.md (the tutorial)
  2. whatever language you all know best in addition to English (I am correct it is French?). The problem we’re solving right now, as a first step, isn’t to optimize for the language with the most bang-for-translation-buck. It’s to create a maintainable, scalable process.

Integrating Translations with RWJS.com

Let’s keep all this as simple as possible. I don’t believe we need subdomains, automated redirects, or anything of like kind as we get off the ground.

The site can “grab” any markdown file from any repo, parse it, and display given config. We just need to:

  • set appropriate URL slug (including something like /en-us/blah-blah)
  • set appropriate page meta info
  • provide a way in the UI to choose the Tutorial language

Am I missing something obvious?

Am I under thinking it?

Hmm this seems like combining both approaches above rather than picking one or the other.

I was suggesting (like React and Gatsby, et al) that we create separate repos for each translation and that each of those is its own self-contained site. All pages start as English and until they get translated, but fr.redwoodjs.com is the home for the French translation. But the build process doesn’t need to be changed at all, other than new Algolia API keys for dedicated indexes for each language.

The other option is to just mix the translated Markdown files right into the main repo and they all live together…TUTORIAL.md, TUTORIAL.fr.md, something like that—we deploy the site as normal and have to add translations of the nav and the ability to switch between languages into the codebase. And have to separate out the translated versions into their own Algolia indexes. The build process would need updates to work with this structure and organize everything.

The option you just proposed sounds like even more work, as the translations live by themselves in their own repo but we need more build process updates to pull each of them into the main repo and act as though they were there to begin with.

I’m advocating for the first option to keep everything clear and separate, and puts us in the same position as the second option as far as untranslated pages—they all appear in English until translated. Do we know anyone from the React/Gatsby/etc team that we can ask their opinion of how this is working for them? Do they deeply regret it and wish they had gone with a single repo?

3 Likes

Welcome back to the woods!

You beat me to replying to Rob :slight_smile:

Let’s keep all this as simple as possible. I don’t believe we need subdomains, automated redirects, or anything of like kind as we get off the ground.

To Rob’s point, it seems simplest to make entirely separate sites by language that mirror the English one (the “React way”). The reasons being:

  • we can use the doc rendering framework as-is, no complicated render logic/seo
  • we avoid the multiplying doc bloat that comes with replicating content inside the main English repo as subfolders by language.

I have an idea for a prototype that combines @rob’s proposal with @thedavid’s centralized translation hub for content. I’m about to whip it up while discussion continues :slight_smile:

Step 1: Content and Language
I’d really like to encourage us to focus on:

  1. tutorial.md (the tutorial)
  2. whatever language you all know best in addition to English (I am correct it is French?).

How many French speakers we got in here? (I only know how to say “I have two eyebrows”)

1 Like

I don’t know about gatsby/react team, but I deeply regret trying to mono-localize even my very small repo that has content geared for children - get’s out of hand fast for more than two languages :laughing:

(granted this is probably not an example of “doing it well”)

1 Like

PS, just stumbled across a write-up on the Meteor team’s localization setup - looks like a combo of the Rob and David ways

http://www.discovermeteor.com/blog/community-translations-with-github-middleman-codeship-heroku/

hahaha I saw you typing at the same time, too!

I’d be happy to help translating the tutorial in French.

A partially translated content might convey a sense of unfinished/unpolished work.

From an SEO point of view it’s also not really recommended to :

  • duplicates versions of the exact same content on different web pages
  • have partially translated web pages
  • content that does not match the metadata

These are general tips you commonly find on Internet.
Still I’m far from being an expert in this field, and cannot assure you they are valid.

It would be ideal to find a way to have low or zero latency in translations after a documentation update.
I don’t know exactly how we can achieve this result.
I think this would lead us to adopt a reasonably short list of target languages and build a strong pool of motivated translators before adding a new language.

1 Like

Thanks, all. Excited about this discussion and the momentum here. I don’t want my 2¢ to weigh any more strongly than anyone else’s, to be clear. My goal is to:

  • empower and enable those who are excited to take this on
  • provide the structure and resources needed to reach the next milestone(s)
    • as well as achieve long-term maintainability
  • help with sanity checks and guidance, especially when it comes to scope and roadmap
    • nothing is more discouraging than taking on too much, too soon.

So those are the things I’m trying to suss out. And definitely deferring to you all to tell me (us) what best will achieve each of the items on this list.

Repo-per-translation

I think my primary reason for suggesting this setup is that it’s an easy way to manage the contributors who are managing a translation. Each translation repo can be assigned to a set of maintainers by using GitHub Repo User permissions. There are labels, issues, Project Boards, integrations, etc.; all of which can be fully managed by the specific team.

The intended effect → we can scale the teams without having to add layers of “management”.

What to translate? How to Manage Translations

Again, I’m getting caught up to speed, but it seems like this conversation is seamlessly going back and forth between translating a 1) website vs. 2) a document. Regarding the former, if we want to offer true i18n for a website, we should be looking at available CMS solutions with content publishing and versioning workflows.

and that each of those is its own self-contained site

Unless I’m missing something, I strongly disagree we start with this goal of translating an entire site. RWJS.com is updated almost daily on average. So every way I’m mentally slicing how this could work feels like a lot of overhead right out of the gate. However, I have no experience with translation tools and automation. If it’s true that 90% of the work can be done by machine and automation, then that’s a different story. Although even in this case I’d still argue to start with a small scope initial milestone before taking on a whole site.

First Milestone == Translate one Document

I’m intentionally trying to force the scope to be on translating a single file, which is tutorial.md.

  • I think it’s the most bang for our buck
  • I think it’s near-term achievable
  • I think it’s long-term maintainable
  • Provides margin for learning, mistakes, and low-cost iteration

Setting URLs and Page Meta

The site already pulls in content from the Framework repo (e.g. the introduction Readme.md). I’ll try to be more specific about what I’m suggesting:

We’ll need to handle Page Meta (React Helmet?)

We’ll need to handle a UI Language switcher. Admittedly, this could be where I’m oversimplifying things. But maybe it’s just a one-time, in-content link on the first page of the tutorial that gives you links to Tutorial introduction page for a specific language.

In the future, depending on whether or not a CMS is implemented, the subdomain option for full sites could definitely be the way to go. I just don’t think it’s the way to start.

Which Language?

Ah, for some reason I thought you were a French speaker as well @clairefro Not sure how that got stuck in my head.

Anyway, again I don’t have a language preference other than choosing one that the initial team is very comfortable with.

A partially translated content might convey a sense of unfinished/unpolished work.

Does that mean you should only make a language available once the whole site is translated? That seems like a big ask!

I’m totally down with starting with the tutorial, I’m not arguing against that (although we may want to do the homepage at the same time—if you can’t even read the intro to the framework or the navigation, even finding the tutorial may be tough).

This means as more and more pages are being translated we’re making the build script and links between them more and more complex, all manually. It will require a lot of coordination between those making the translations and someone adding those links to the build script before they’re available to be seen.

And the end result is no different than if we just had separate sites to begin with—any pages that aren’t translated yet are in English.

With a separate repo at a separate subdomain, everyone is responsible for their own language and that’s it—when it’s ready it goes live, just merge to that language’s main.

The complexity I see with this subdomain-per-repo solution is, like you said, changes to the main site showing in those others. It looks like the React team handles this by opening a PR on the translated sites’ repos when something gets merged to main in the English repo. If the change is to a page that’s still in English (it hasn’t been translated yet) then just click the merge button. If the the change is to a translated page then it’s no different than any other solution—someone will need to resolve that change by updating the translation to match (and then closing the PR).

However, even this seems like a better workflow because all translated sites are notified that there is a change that needs to be made. If these translation repos are just a smattering of individual docs how are you going to keep them coordinated to notify each other of changes to the English version? You have to either count on the translators to constantly be watching the English repo, or do a bunch of manual work to add build scripts to open PRs for only those changed docs in those translated repos (and constantly update that script to include more pages as they become translated).

This also removes the issue of page metadata—it all lives in the translated repos.

We still need a UI switcher, so maybe that lives in its own repo and is pulled in during build time so everyone uses the same code and new languages coming online are available everywhere simultaneously (we could do something similar to what we do where a push to redwoodjs main causes a re-deploy in Netlify to pick up any changes to internal docs).

So I don’t think that wanting to start small and only translate the tutorial has any effect on this decision as far as the viewer is concerned—whether we copy the whole site, or just the tutorial to its own repo, the end result to the user viewing the site is the same: everything is in English except for the tutorial. Where the decision does matter is how much work it is for us to maintain, and my argument is that it’s a lot less work to just copy the whole site to a subdomain and let the translated docs live in their own self-contained universe.

whether we copy the whole site, or just the tutorial to its own repo, the end result to the user viewing the site is the same: everything is in English except for the tutorial.

Very true.

Our challenges revolve around

  • build complexity (single-repo)
  • anti-staleness (per-language-repo solution)

Regarding anti-staleness, regardless of the structure we choose it feels like (for sanity) there would be a need for some CI to execute automated actions when changes are made to the source repo.

Bare with me a moment - what if our translated files were like mushrooms, and staleness was monitored as a hidden layer beneath the ground - the mycelium layer.

Doc staleness-monitoring could occur in a hidden “mycelium” repo, which has two dirs: one synced with the entire redwoodjs.com source (perhaps a submodule even?) and one that stores only completed translations of markdowns in a folder structure that mimics the redwoodjs.com project. All doc translations would be managed through the gitlocalize dash which displays staleness vs. source.

The mycelium layer only exists for pulling changes from the source, and making PRs to other language repos. This could be automated.

In the per-language-repo situation, all subdoman sites would start out as copies of the English site. As translations for languages (say, fr) grow in the mycelium, they are pulled into that language’s subdomain repo in the appropriate directory based on it’s path in the mycelium, overwriting the placeholder english doc with the same filename. The subdomain could be deployed publicly once deemed ‘translated enough’

A very rough sample:

I don’t know how feasible this would be but feel github actions or some other CI could help automate the syncing/PRs. This approach helps with staleness in a fast-growing doc network, but is it over-complicated?

That’s the first time I’ve heard the word “mycelium” outside of Star Trek: Discovery! They probably explained it just like you did, but I must not have been paying attention because I thought they invented that term for the show!

I’m not sure we need that repo in the middle…the workflow I was picturing was something like:

  1. At some point someone decides to start a new translation
  2. We take the current state of the redwoodjs.com repo main and make a new repo, fr.redwoodjs.com for example
  3. Someone does some translating in the new repo, eventually gets their main to where they want it.
  4. Once they have enough that we think the new translation can go live (maybe the minimum is the homepage and the nav?), we setup a new Netlify site for fr.redwoodjs.com that deploys from the fr.redwoodjs.com repo main branch
  5. We update some new language/locale picker repo like locales.redwoodjs.com (doesn’t exist yet) to include the new language. All this has is a dropdown menu with the available locales. Each translated repo (including redwoodjs.com) pulls this down during the build phase and adds the code to the top nav.

And that’s it! The sites live on parallel tracks forever. No build script changes every time a new particular doc file is translated, nothing like that.

Now, we just need to notify translated sites about changes to the english site:

  1. Whenever main is updated in redwoodjs.com we automate opening a PR on all the locale repos with that commit. (Maybe this is a GitHub action?)
  2. The owners of that locale either merge it automatically (because the change is to an english page they haven’t translated yet) or they work on updating their translation to include whatever change was made in the PR (then they can then just delete the PR and commit their change to their own main, or go through whatever PR/review flow they want).

Am I missing anything? Are there gaps in here that mycelium version fills in that I’m forgetting?

1 Like
  1. Whenever main is updated in redwoodjs.com we automate opening a PR on all the locale repos with that commit. (Maybe this is a GitHub action?)

Great! If possible could be even cooler to auto-merge the change too, unless it is a .md

Am I missing anything? Are there gaps in here that mycelium version fills in that I’m forgetting?

The only gap the mycelium thingy fills here is this scenario: 4 new English docs have been PRed and merged untranslated to the FR repo. Translators are looking for things to do… so they hunt for work by clicking through all the folders in search of an English md file.

However the scale of redwoodjs.com is not so big, so this gap could probably be filled without mycelium by having language repo managers manually project manage, such as having them add the newly arrived untranslated doc to a TODO kanban in github projects or opening and issue or something

All-in-all your proposed flow feels like the right track

Another benefit of the per-language-repo solution:

Each language repo’s README.md could contain a term glossary table for consistency between multiple translators

1 Like