Translating the Tutorial (and, maybe, Docs)

@clairefro @Thieffen consolidating conversation about translations. No pressure for either of you to take the lead on this, but I could sure use help getting some momentum.

For me, the first question I have is what should we try to translate? Redwood has a lot of documentation that’s still changing very quickly. Additionally, most of our individual docs tend to be thorough, which is great for usefulness but makes for a lot of translation work. My vote for getting started is to singularly focus on the Tutorial.

The next question to answer will be how to manage the translation content, update/changes/versions, and display on the website. My assumption is that we start heavy on process and then figure out the tech and tools as we go. (Note: the redwoodjs.com website is a custom static site generator built by Rob. The advantage is that it’s very lightweight and flexible. The disadvantage is that it’s not a “docs framework” with built-in features.)

Lastly, which languages to translate needs to be answered. Let’s figure out the first one in addition to English based on who is available to help. Once we get one translation shipped, I bet others will follow quickly.


Thoughts, reactions, or suggestions?

Just wrote my thoughts on this over in the tutorial page actually, thanks for making a topic.

Claire Froelich and I are just beginning to do some research into getting translations for docs and other material going, and we’d love to get as many languages as possible.

We’ll probably want to wait until the Twotorial is done before getting into translations but if you were looking for somewhere to start now you could translate the main doc page into French and then Italian.

I’m really passionate about this and think making as much of our documentation available in at least a few major languages will be a massive but highly valuable investment.

I think the main page is likely to change the least of any part of the docs at this point and is also the first and most important thing people read.

1 Like

I would say, at least, the beginner / newcomer oriented parts of the documentation.
Also, since these parts deal with the fundamentals, they are less likely to undergo drastic changes in the weeks to come.
More in-depth technical parts can be dealt with later.

Indeed, tutorial and main page, as @ajcwebdev suggested, looks to be the first natural targets

For sure we need to set-up a workflow to handle this recurrent task.
In terms of tooling, Coursera is relying on https://www.smartling.com/
Don’t know if this kind of service could be relevant for us.

Also, @mojombo said something about translations and chatterbug…
Apparently they built some kind of expertise in this field with @peterp

Storybook built a whole team dedicated to handle the learning path + translations.
Look at the different languages available below the big “get started” button.

We could try to start with a few belonging to the top 10 languages by total number of speakers:

  1. English (1.132 billion total speakers)
  2. Mandarin Chinese (1.117 billion total speakers)
  3. Hindi (615 million total speakers)
  4. Spanish (534 million total speakers)
  5. French (280 million total speakers)
  6. Standard Arabic (274 million total speakers)
  7. Bengali (265 million total speakers)
  8. Russian (258 million total speakers)
  9. Portuguese (234 million total speakers)
  10. Indonesian (199 million total speakers)

https://www.babbel.com/en/magazine/the-10-most-spoken-languages-in-the-world

Thanks for that list. Very interesting :+1:

Another way to look at what languages to choose is to look at the English Proficiency Index

Those with the lowest proficiency in English are those who would most greatly benefit from a translation :slight_smile:

Here are the 10 countries with the least proficity in English

  1. Angola
  2. Oman
  3. Kazakhstan
  4. Cambodia
  5. Uzbekistan
  6. Ivory Coast
  7. Iraq
  8. Saudi Arabia
  9. Kyrgyzstan
  10. Libya

There is some overlap in our two lists, so those would make great first (or second, perhaps) targets. Like Bengali/Bangladesh and Angola/Portuguese (there might be more, haven’t checked)

Love both those point @Tobbe and @Thieffen, another approach that could help us scale this effort would be utilizing a human in the loop workflow for languages that are highly optimized for automatic machine learning. With the generative transformer models currently being used translation models will perform differently based on:

  • How syntactically and semantically similar are the two languages? For English the closest languages are a mishmash of Germanic and Romance languages.
  • How much training data is available for the two languages? Also known as high-resource vs. low resource languages. Languages that are high resource relative to English include Chinese, Arabic and French along with German, Portuguese, Spanish and Finnish.

The reason I think this is relevant is that if we could select languages that can get, let’s say, 90% accuracy, we can then give that output to a translator who just have to fix the last 10% instead of doing the entire translation from scratch.

1 Like

Love the ideas here. My suggestion would be to find a workflow that allows crowd source translation, as the fun-tone language used in the docs is not wonderfully suitable for machine translation.

This open source github plugin looks interesting as a translation management tool https://gitlocalize.com/ It uses git diff in the markdown files to determine which translations are missing/need updating, which if set up correctly could open the door to crowd sourced translation.

@achwebdev also mentioned React as a use-case


One other area that needs attention is videos. The tutorial (and twotorial maybe?) pose the challenge of being largely video and requiring lots of subtitles. As I’m aware Youtube can auto-generate subtitles in the original language which could serve as transcripts as a base for translation, but I think translation is the burden of the content provider. Not sure - something to look into.

Regarding which languages to tackle first, i’d say pick a few that members of the current redwood contributor community are familiar with our have contacts who would be interested in getting started - just so we have a model of the translation flow for expanding to other languages

Looking forward to reading into everyone’s comments and playing around over the weekend!

1 Like

Here’s a comment from Rob about this back in May in reference to the React docs.

Here’s what reactjs.org does:

Once you say that you want to own a translation they create a new repo with a complete copy of the site and keep it self contained. Changing languages on the site means you go to a separate subdomain, which serves the contents of that repo.

I’m not sure at what point the new translated version of the site is considered complete enough that it can go live? And I can’t imagine what’s involved in updating all translated sites every time a PR is merged in the english version.

I tend to agree

Did a quick test this morning, it looks promising.
Could probably be usefull with video subtitles as well.

I’m not talking about running the docs through a translation service and then putting them up online, they would still be reviewed by human translators. The automatic translation step is for the translator. The average translator produces somewhere between 400 and 600 finished words per hour.

The entirety of the Redwood docs is about 53,000 words right now so if the models throw out gibberish 50% of the time you’d still save that person 53 hours of work. People argue about the accuracy numbers but like I said it varies language to language.

1 Like

Some more details on https://gitlocalize.com/

general workflow :
Each project has a team in which you have 3 roles available :

  • Admin
  • Moderator
  • Translator

A translator has access to a convenient user interface (a kind of git diff) to translate a particular file. Not all file formats are supported, markdown is, all along with JSON, HTML, YAML.

After translating a file, partially or totally, a translator can issue a “review request”. While there is at least one ongoing “review request”, the file is not editable anymore by other translators.

Then, a moderator review the request. He can start a discussion by adding comments and/or close the request review. Once a page is fully translated, a moderator has the possibility to issue a PR on the repository with the translated files. Apparently, it’s not possible to issue a PR on a file that is partially translated.

Finally, repository owner can deal with the pull request as usual.

Key points :

  • Nice UX for translators, virtually no entry barrier to start contributing. Only a github account is mandatory.

  • Process fully integrated to our usual GIT workflow. Relies on pull Request and support branching. This means we can start to translate the “twotorial” right now, not waiting for it to be merged on “main” branch.

  • No need to replicate entirely the whole website for each language (as it is the case in the React solution with subdomains). We just set where are living the translated files in the repository (like /translations/fr/). Only the translated files are kept in this directory.

  • Easy to monitor translation completion (+ badges)

  • Adhoc machine translator assistant is provided with a simple button. As underlined by @ajcwebdev It helps to speed-up the work for translators. Looked pretty accurate with French, still need a human touch to fully convey a fun-tone language.

Did not spot any advantage for a person to be officially granted the translator role, a part being listed on the team. Maybe there is a setting somewhere to prevent an “unofficial” translator to be able to issue a review request.

France is #31 while Italy is #36, seems accurate :smile:

China being #40, a Chinese translation would be nice for sure.
I remember, at our very first meeting, someone offered to help with Chinese translations.

Here’s an abridged version of Gatsby’s contributing guide for translation.

The general process

Each translation has its own repository in the organization on GitHub, with designated codeowners to review and approve changes. Each repo will include all pages needing translations (with some prioritized over others), and a bot to notify of changes in the main repo to keep everything up-to-date.

Creating a new translation

See Starting a new language to start up a new translation repository.

Contributing translations

See the translation contributor guide for information on how to contribute translations in your language.

Maintainers

Each translation repo will have at least two maintainers and codeowners that are responsible for the upkeep of the repo. See the Translation Maintainer Guide for information on the responsibilities of translation maintainers.

Language-specific channels

Each translation group may want to have a space for maintainers and community members to ask questions and coordinate the project.

Creating a translation

Read the maintainer guide

Before requesting a new translation, make sure to read the maintainer responsibilities to affirm that you accept the responsibilities of being a translation maintainer.

Check other issues

Before creating a new issue, make sure to check the list of open translation requests. If one already exists for your language, ask to be added to the list of maintainers there.

Create a translation request issue

If you don’t see the language among the issues listed, feel free to create a new translation request issue for it and follow the instructions.

Finding codeowners

For a new translation, open an issue with information about your intended language. If you already have co-contributors to act as fellow code owners and provide checks and balances for PR reviews and quality assurance, that would be very helpful! Otherwise, you can check out other translation request issues people have made and offer to join.

Criteria for translation approval

A translation request will be chosen for approval based on the following criteria:

  • Are there at least two maintainers listed?
  • Do at least one of the maintainers have previous open-source experience and experience working with GitHub and git?
  • Are the maintainers fluent speakers? Maintainers do not need to have experience translating, but must be fluent enough in the language to be able to translate technical writing.

After approval

Once the translation request is approved, a member of the core team will run an automated script to create your repository and set everything up.

Contributing to a Translation

Once a language repository is created and someone on the core team has assigned codeowners, contributions can begin. It is up to the discretion of the contributor how exactly they want to work, but it’s recommended to limit the scope of PRs to 1 doc at a time to aid with code reviewing.

Use English as the source

The website is written first in English and should be considered the source material for all translations (as opposed to starting from another translation). When a repository is created, it will provide a copy of the docs to be translated which you can then update through pull requests against them in the relevant language.

Changes to the meaning of a text or code example should be done in the main English repo, and then translated afterwards to keep the content aligned across languages.

Common types of merge issues

Typos fixes

Sometimes there is a typo or grammatical error in the English source that gets fixed in an update. Since these typos most likely don’t exist in the translated version, you can most likely use the translated version as-is.

Content changes

Sometimes, the content of the source page is actually updated and needs a translation. Make sure to read the change carefully and change the translation to match its meaning.

Conflicts in untranslated files

Sometimes, you may find conflicts in files that haven’t been translated yet. This is usually because of a previous improper merge (for example, using the “Squash and merge” option).

Creating a separate pull request

If a page has significant changes, it may be worth splitting it into its own pull request.

Translation Maintainer Guide

This page lists the responsibilities of translation maintainers and provides tips on how to better manage your repository.

Maintainer responsibilities

Your responsibilities are as follows:

  • Keep issues up-to-date as people volunteer to translate pages.
  • Review pull requests made by contributors promptly.
  • Review auto-generated pull requests generated in order to make sure translations remain up-to-date with the source repo.
  • Act as point of contact for your language and answer questions from both contributors to your language and the core team.
  • Set up a process in order to get your translation published.

As a maintainer, you are welcome to add a contributing doc written in your language to assist with the process.

Tips

Set up a style guide and glossary

Your language repo comes with a template style guide that you can use to put in style rules specific to your language. Refer to the translation style guide for more information.

Set up a review process

As codeowners, you have the freedom and responsibility to decide what your review process will be like. You can decide how many reviewers you’d like. If your team is small, one reviewer may be enough. But if you have lots of contributors and enough codeowners, you may want to require two reviewers for additional quality.

Prioritize pages

The repo creation script will create a progress issue listing the list of core pages to translate. Once these core pages are done, make to update the issue or create a new one in order to schedule work for the rest of the docs.

Reference guide overview pages are also worth translating to establish a fully translated path to a frequently visited reference guide, though overview pages are listed at a lower priority.

Ask for help

Don’t be afraid to ask for help!

Don’t let translations stall

Check in periodically with contributors to make sure translations are being done promptly. If it’s been a while since a page was assigned without any progress, check in with the contributor and ask for a status update. If the contributor is unresponsive, you may need to free up the page for someone else to work on.

Spread the word!

If you’re finding it hard to find people to help translate, spread the word about your translation effort! Ask people in local meetups if they would be interested in contributing.

Template responses for closing PRs

Sometimes a PR has a valid reason to not be merged as-is. Templates can help speed up the process of responding to someone while encouraging future contributions.

PRs with quality issues

If a PR includes content that is of poor quality (such as from Google Translate or missing important nuance) or doesn’t meet the requirements, it would help to include a drafted reply to encourage contributors to continue with the project.

PRs with changes more fitting for the main Gatsby repo

Because the main Gatsby repo is the source of content, more substantive changes should be closed and redirected there.

Translation Style Guide

Each translation group should decide on conventions and stick with them for consistency, documenting those decisions in the repo’s style guide file to set contributors up for success. Use the English style guide as a reference to determine the equivalent rules in your language.

Translated docs and learning materials should maintain these values with high-quality spelling and grammar, accurate information, similar structure and purpose. For any questions about guidelines, feel free to get in touch with the core team.

Glossary

The style guide has a glossary section that you can use to fill in common translations. Look at the English Glossary for a list of terms that are useful to have translations for.

Universal style guide

The following rules should apply in all translations and can serve as a basis for your language-specific style guide.

Keep the meaning of the source

Keep the meaning of the original English source even if it is confusing or has a typo. If you find an error that can be fixed, create an issue or pull request to the original repo so that all translations can benefit from the change.

Text in code blocks

Leave text in code blocks untranslated except for comments. You may optionally translate text in strings, but be careful not to translate strings that refer to code!

Here’s Vue’s protocol.

On Translations

Translations for this documentation project are currently maintained in separate repositories forked from this original one.

Arabic

Arabic translation is maintained by Interstellar Club

French

French translation is maintained by Vuejs-FR.

Italian

Japanese

Japanese translation is maintained by Vue.js japan user group

Korean

Korean translation is maintained by Vue.js Korean User group.

Mandarin

Persian (Farsi)

Persian translation is maintained by VueJS-fa.

Português-Br

Português-Br translation is maintained by Vuejs-Br.

Russian

Russian translation is maintained by Translation Gang.

Spanish

Vietnamese

Vietnamese translation is maintained by Vue.js Vietnam User group.

Bahasa Indonesia

Bahasa Indonesia translation is maintained by Vue.js Indonesia.

Want to help with the translation?

If you feel okay with translating quite alone, you can fork the repo, post a comment on the Community Translation Announcements issue page to inform others that you’re doing the translation and go for it.

If you are more of a team player, Translation Gang might be for you. Let us know somehow that you’re ready to join this international open-source translators community.

They also keep a running issue.

This serves as a list of announcements for community translations. Instead of creating a new issue, a new translation announcement (along with its description, repository URL, call for contributors etc.) should be posted here as a comment. Further discussions and progress tracking for a specific translation should happen in the corresponding repository.

Nat Alison on translating React:

Our original approach for translations was to use a SaaS platform that allows users to submit translations. There was already a pull request to integrate it and my original responsibility was to finish that integration. However, we had concerns about the feasibility of that integration and the current quality of translations on the platform. Our primary concern was ensuring that translations kept up to date with the main repo and didn’t become “stale”.

Dan encouraged me to look for alternate solutions, and we stumbled across how Vue maintained its translations – through different forks of the main repo on GitHub. In particular, the Japanese translation used a bot to periodically check for changes in the English repo and submits pull requests whenever there is a change.

This approach appealed to us for several reasons:

  • It was less code integration to get off the ground.
  • It encouraged active maintainers for each repo to ensure quality.
  • Contributors already understand GitHub as a platform and are motivated to contribute directly to the React organization.

We started off with an initial trial period of three languages: Spanish, Japanese, and Simplified Chinese. This allowed us to work out any kinks in our process and make sure future translations are set up for success. I wanted to give the translation teams freedom to choose whatever tools they felt comfortable with. The only requirement is a checklist that outlines the order of importance for translating pages.

After the trial period, we were ready to accept more languages. I created a script to automate the creation of the new language repo, and a site, Is React Translated Yet?, to track progress on the different translations. We started 10 new translations on our first day alone!

Because of the automation, the rest of the maintenance went mostly smoothly. We eventually created a Slack channel to make it easier for translators to share information, and I released a guide solidifying the responsibilities of maintainers. Allowing translators to talk with each other was a great boon – for example, the Arabic, Persian, and Hebrew translations were able to talk to each other in order to get right-to-left text working!

Just to summarize the ideas on David’s original 3 questions based on the thread so far (feel free to correct me):

WHAT CONTENT: tutorials + docs
WHICH LANGUAGES: Most-spoken vs. Least-likely-to-have-ESL-speakers (Either way, should start with a very small handful)

As to the HOW… I’m seeing some patterns across the good practice examples :

  • Use English as base language :slight_smile:
  • Assign translation roles with dedicated maintainers for each language
  • Use of bots for checking for staleness against English (Vue + React)

I don’t know much yet about Rob’s framework used in redwoodjs.com and what the optimal way to dish out all these multiplying markdown files would be (ex: redwoodjs.com/docs/zh vs subdomain docs.redwoodjs.com/zh etc…), which would help point us in a direction for repo structure… your insights wanted!

It’d be really cool if we could devise a workflow that uses gitlocalize, for the ease of maintainability out of the box nicely summarized by @Thieffen . Plus I confirmed it allows auto-translation that you can manually edit, per @ajcwebdev 's point about machine aided translation:

Dashboard of language progress, review request


Machine aided translation

However, the limitation with gitlocalize is that it seems to be set up for creating PRs to the repo where the source content lives (like learnstorybook.com scenario). If we wanted to do separate repos by lang (like React docs scenario), we would probably need to set up our own sync environment (there’s precedence with the React docs translation bot).

For now I’m off to learn the Way of cameronjs

I lied - apparently gitlocalize allows you to specify a target repo(s) for translations, so translations could live outside the main repo if desired. Just keeping us aware of tools at hand :slight_smile:

Here I’m testing things out in same-repo, but you can see the field allows for any external repo

Might Netlify country/language redirects be helpful here?

if a user in Israel with Hebrew language preference visits / , they’ll get redirected directly to /israel/he in one step. Our cache server will cache this redirect for any other users that would match the same country and language rules.

1 Like

This is a cool feature to know about. However, as someone mentioned in discord it is nice (especially for people straddling multiple regions/languages) to have explicit control over which language of docs to see.

For example, I lived in in Japan for a long time and my browser is still set to ja, but sometimes I just wanna see the English docs without being redirected to ja

1 Like

So I’ve been playing with the redwoodjs.com code just to learn how docs are currently generated.

Just for kicks I’m trying it out with this setup:

  • using gitlocalize to manage ja and es translations (see repo here)
  • Made code tweaks to add ja-docs and es-docs routes on branch language-sandbox (see forked branch here)

You can see what specifically was changed in this PR
(everything in code/html was auto-generated on yarn build)

I managed to get it to generate some of the translations from gitlocalize with a couple caveats

  • I haven’t figured out whether the current setup in build.js and docutron.js allow for creating nested html folders. So for now playing with one-level nested “book” names like /ja-docs/... and /es-docs/... as opposed to docs/ja/... etc
  • Current setup requires manual addition of new pages in the build process. Two bonuses here though: 1) you can localize slugs by specifying a title in the SECTIONS object’s file object, 2) you can ‘cherry pick’ which approved translations make the final cut for the public
  • Took me a while to figure out the code/html directory needs you to manually add a new directory in order to create new books (would be great to make some kinda mkdir -p functionality for the build process when creating html files if we go this route.) Also, the build process only adds files, doesn’t sync to remove an html file when the source md file is removed.
  • Haven’t bothered coding any lang specific navigation so only manual url navigation for this test :stuck_out_tongue:

Slugs below come from build.js SECTIONS object’s
files: [{ title}]
(sí, there is a typo in the spanish ;))

Japanese auth page sample:

Spanish auth page sample:

Didn’t proof read the translations - just set the auto translate on to test encoding (which looks good :+1: )

Just sharing my play things as I learn the doc build process out-loud

Concerns I have so far with this particular test setup:

  • Lots of manual entry for new docs = lots of potential for errors and missed/stale/unwanted translations and original docs
  • All the markdowns for all the languages would live in the main repo, which could be cumbersome in the future

I’m not married to anything, But LOVING gitlocalize :slight_smile:

2 Likes

Swear I’ll stop spamming the forum channel on discord today - last one :wink:

After re-reading this thread I get the feel that the consensus is to prioritize the tutorials over the docs. Just pretend the above are tutorial pages :stuck_out_tongue:

Also thinking we could have a totally separate standalone repo for organizing translations of the youtube videos transcripts. That translation is for a different platform and could easily be set up in a new ‘redwoodjs.com-tutorial-video-i18n’ repo with subfolders for different languages. The videos are less likely to be edited and don’t need staleness monitoring until new vids are added. I’d be happy to set up this youtube translations repo if you guys feel it’s the right approach

1 Like