@clairefro@Thieffen consolidating conversation about translations. No pressure for either of you to take the lead on this, but I could sure use help getting some momentum.
For me, the first question I have is what should we try to translate? Redwood has a lot of documentation that’s still changing very quickly. Additionally, most of our individual docs tend to be thorough, which is great for usefulness but makes for a lot of translation work. My vote for getting started is to singularly focus on the Tutorial.
The next question to answer will be how to manage the translation content, update/changes/versions, and display on the website. My assumption is that we start heavy on process and then figure out the tech and tools as we go. (Note: the redwoodjs.com website is a custom static site generator built by Rob. The advantage is that it’s very lightweight and flexible. The disadvantage is that it’s not a “docs framework” with built-in features.)
Lastly, which languages to translate needs to be answered. Let’s figure out the first one in addition to English based on who is available to help. Once we get one translation shipped, I bet others will follow quickly.
Just wrote my thoughts on this over in the tutorial page actually, thanks for making a topic.
Claire Froelich and I are just beginning to do some research into getting translations for docs and other material going, and we’d love to get as many languages as possible.
We’ll probably want to wait until the Twotorial is done before getting into translations but if you were looking for somewhere to start now you could translate the main doc page into French and then Italian.
I’m really passionate about this and think making as much of our documentation available in at least a few major languages will be a massive but highly valuable investment.
I think the main page is likely to change the least of any part of the docs at this point and is also the first and most important thing people read.
I would say, at least, the beginner / newcomer oriented parts of the documentation.
Also, since these parts deal with the fundamentals, they are less likely to undergo drastic changes in the weeks to come.
More in-depth technical parts can be dealt with later.
Indeed, tutorial and main page, as @ajcwebdev suggested, looks to be the first natural targets
For sure we need to set-up a workflow to handle this recurrent task.
In terms of tooling, Coursera is relying on https://www.smartling.com/
Don’t know if this kind of service could be relevant for us.
Also, @mojombo said something about translations and chatterbug…
Apparently they built some kind of expertise in this field with @peterp
Storybook built a whole team dedicated to handle the learning path + translations.
Look at the different languages available below the big “get started” button.
We could try to start with a few belonging to the top 10 languages by total number of speakers:
Those with the lowest proficiency in English are those who would most greatly benefit from a translation
Here are the 10 countries with the least proficity in English
Angola
Oman
Kazakhstan
Cambodia
Uzbekistan
Ivory Coast
Iraq
Saudi Arabia
Kyrgyzstan
Libya
There is some overlap in our two lists, so those would make great first (or second, perhaps) targets. Like Bengali/Bangladesh and Angola/Portuguese (there might be more, haven’t checked)
Love both those point @Tobbe and @Thieffen, another approach that could help us scale this effort would be utilizing a human in the loop workflow for languages that are highly optimized for automatic machine learning. With the generative transformer models currently being used translation models will perform differently based on:
How syntactically and semantically similar are the two languages? For English the closest languages are a mishmash of Germanic and Romance languages.
How much training data is available for the two languages? Also known as high-resource vs. low resource languages. Languages that are high resource relative to English include Chinese, Arabic and French along with German, Portuguese, Spanish and Finnish.
The reason I think this is relevant is that if we could select languages that can get, let’s say, 90% accuracy, we can then give that output to a translator who just have to fix the last 10% instead of doing the entire translation from scratch.
Love the ideas here. My suggestion would be to find a workflow that allows crowd source translation, as the fun-tone language used in the docs is not wonderfully suitable for machine translation.
This open source github plugin looks interesting as a translation management tool https://gitlocalize.com/ It uses git diff in the markdown files to determine which translations are missing/need updating, which if set up correctly could open the door to crowd sourced translation.
@achwebdev also mentioned React as a use-case
One other area that needs attention is videos. The tutorial (and twotorial maybe?) pose the challenge of being largely video and requiring lots of subtitles. As I’m aware Youtube can auto-generate subtitles in the original language which could serve as transcripts as a base for translation, but I think translation is the burden of the content provider. Not sure - something to look into.
Regarding which languages to tackle first, i’d say pick a few that members of the current redwood contributor community are familiar with our have contacts who would be interested in getting started - just so we have a model of the translation flow for expanding to other languages
Looking forward to reading into everyone’s comments and playing around over the weekend!
Once you say that you want to own a translation they create a new repo with a complete copy of the site and keep it self contained. Changing languages on the site means you go to a separate subdomain, which serves the contents of that repo.
I’m not sure at what point the new translated version of the site is considered complete enough that it can go live? And I can’t imagine what’s involved in updating all translated sites every time a PR is merged in the english version.
I’m not talking about running the docs through a translation service and then putting them up online, they would still be reviewed by human translators. The automatic translation step is for the translator. The average translator produces somewhere between 400 and 600 finished words per hour.
The entirety of the Redwood docs is about 53,000 words right now so if the models throw out gibberish 50% of the time you’d still save that person 53 hours of work. People argue about the accuracy numbers but like I said it varies language to language.
general workflow :
Each project has a team in which you have 3 roles available :
Admin
Moderator
Translator
A translator has access to a convenient user interface (a kind of git diff) to translate a particular file. Not all file formats are supported, markdown is, all along with JSON, HTML, YAML.
After translating a file, partially or totally, a translator can issue a “review request”. While there is at least one ongoing “review request”, the file is not editable anymore by other translators.
Then, a moderator review the request. He can start a discussion by adding comments and/or close the request review. Once a page is fully translated, a moderator has the possibility to issue a PR on the repository with the translated files. Apparently, it’s not possible to issue a PR on a file that is partially translated.
Finally, repository owner can deal with the pull request as usual.
Key points :
Nice UX for translators, virtually no entry barrier to start contributing. Only a github account is mandatory.
Process fully integrated to our usual GIT workflow. Relies on pull Request and support branching. This means we can start to translate the “twotorial” right now, not waiting for it to be merged on “main” branch.
No need to replicate entirely the whole website for each language (as it is the case in the React solution with subdomains). We just set where are living the translated files in the repository (like /translations/fr/). Only the translated files are kept in this directory.
Easy to monitor translation completion (+ badges)
Adhoc machine translator assistant is provided with a simple button. As underlined by @ajcwebdev It helps to speed-up the work for translators. Looked pretty accurate with French, still need a human touch to fully convey a fun-tone language.
Did not spot any advantage for a person to be officially granted the translator role, a part being listed on the team. Maybe there is a setting somewhere to prevent an “unofficial” translator to be able to issue a review request.
China being #40, a Chinese translation would be nice for sure.
I remember, at our very first meeting, someone offered to help with Chinese translations.
Gatsby’s documentation includes a very thorough contributing guide for translation which includes processes as well as helpful advice to potential maintainers. Some of the advice is specific to translation while other advice is more general good documentation habits that becomes more relevant in a multilingual environment.
I’ve done my best to just summarize the process here along with the tips related to translation. For the full list of considerations please consult the official Gatsby documentation.
The general process
Each translation has its own repository in the organization on GitHub, with designated codeowners to review and approve changes. Each repo will include all pages needing translations (with some prioritized over others), and a bot to notify of changes in the main repo to keep everything up-to-date.
Maintainers: Each translation repo will have at least two maintainers and codeowners that are responsible for the upkeep of the repo.
Language-specific channels: Each translation group may want to have a space for maintainers and community members to ask questions and coordinate the project.
Creating a translation
Before requesting a new translation read the maintainer responsibilities to affirm that you accept the responsibilities of being a translation maintainer. Check the list of open translation requests and if you don’t see the language listed create a new translation request issue.
Finding codeowners
For a new translation, open an issue with information about your intended language. If you don’t already have co-contributors to act as fellow code owners check out other translation request issues people have made and offer to join.
Criteria for translation approval
A translation request will be chosen for approval based on the following criteria:
Are there at least two maintainers listed?
Do at least one of the maintainers have previous open-source experience and experience working with GitHub and git?
Are the maintainers fluent speakers? Maintainers do not need to have experience translating, but must be fluent enough in the language to be able to translate technical writing.
Once the translation request is approved, a member of the core team will run an automated script to create your repository and set everything up.
Use English as the source
The website is written first in English and should be considered the source material for all translations (as opposed to starting from another translation).
When a repository is created, it will provide a copy of the docs to be translated which you can then update through pull requests against them in the relevant language.
Changes to the meaning of a text or code example should be done in the main English repo, and then translated afterwards to keep the content aligned across languages.
Translation Maintainer Guide
Maintainer responsibilities
Keep issues up-to-date as people volunteer to translate pages.
Review pull requests made by contributors promptly.
Review auto-generated pull requests generated in order to make sure translations remain up-to-date with the source repo.
Act as point of contact for your language and answer questions from both contributors to your language and the core team.
Set up a process in order to get your translation published.
As a maintainer, you are welcome to add a contributing doc written in your language to assist with the process.
Tips
Set up a style guide and glossary
Your language repo comes with a template style guide that you can use to put in style rules specific to your language.
Refer to the translation style guide for more information.
Prioritize pages
The repo creation script will create a progress issue listing the list of core pages to translate. Once these core pages are done, make to update the issue or create a new one in order to schedule work for the rest of the docs.
Reference guide overview pages are also worth translating to establish a fully translated path to a frequently visited reference guide, though overview pages are listed at a lower priority.
Spread the word!
If you’re finding it hard to find people to help translate, spread the word about your translation effort!
Ask people in local meetups if they would be interested in contributing.
Translation Style Guide
Each translation group should decide on conventions and stick with them for consistency, documenting those decisions in the repo’s style guide file to set contributors up for success.
Use the English style guide as a reference to determine the equivalent rules in your language.
Translated docs and learning materials should maintain these values with high-quality spelling and grammar, accurate information, similar structure and purpose.
Glossary
The style guide has a glossary section that you can use to fill in common translations.
Look at the English Glossary for a list of terms that are useful to have translations for.
Universal style guide
Keep the meaning of the source
Keep the meaning of the original English source even if it is confusing or has a typo.
If you find an error that can be fixed, create an issue or pull request to the original repo so that all translations can benefit from the change.
Text in code blocks
Leave text in code blocks untranslated except for comments.
You may optionally translate text in strings, but be careful not to translate strings that refer to code!
Vue’s protocol is much more relaxed than Gatsby and lets the translations emerge organically from the community. Translations are currently maintained in separate repositories forked from the original.
If you feel okay with translating quite alone, you can fork the repo, post a comment on the Community Translation Announcements issue page to inform others that you’re doing the translation and go for it.
If you are more of a team player, Translation Gang might be for you. Let us know somehow that you’re ready to join this international open-source translators community.
This serves as a list of announcements for community translations. Instead of creating a new issue, a new translation announcement (along with its description, repository URL, call for contributors etc.) should be posted here as a comment. Further discussions and progress tracking for a specific translation should happen in the corresponding repository.
Nat Alison’s article Is React Translated Yet? ¡Sí! Sim! はい! was one of the most useful resources I found looking at the problem holistically. They took cues from Vue along with introducing some of their own processes which have been adopted by other projects such as Gatsby.
Our original approach for translations was to use a SaaS platform that allows users to submit translations. There was already a pull request to integrate it. However, we had concerns about the feasibility of that integration and the current quality of translations on the platform. Our primary concern was ensuring that translations kept up to date with the main repo and didn’t become “stale”.
Dan encouraged me to look for alternate solutions, and we stumbled across how Vue maintained its translations – through different forks of the main repo on GitHub. In particular, the Japanese translation used a bot to periodically check for changes in the English repo and submits pull requests whenever there is a change.
This approach appealed to us for several reasons:
It was less code integration to get off the ground.
It encouraged active maintainers for each repo to ensure quality.
Contributors already understand GitHub as a platform and are motivated to contribute directly to the React organization.
We started off with an initial trial period of three languages:
Spanish
Japanese
Simplified Chinese
This allowed us to work out any kinks in our process and make sure future translations are set up for success. I wanted to give the translation teams freedom to choose whatever tools they felt comfortable with. The only requirement is a checklist that outlines the order of importance for translating pages.
After the trial period, we were ready to accept more languages. I created:
Allowing translators to talk with each other was a great boon – for example, the Arabic, Persian, and Hebrew translations were able to talk to each other in order to get right-to-left text working!
React Maintainer Guide
Maintainer Responsibilities
Keep the Progress issue up to date as people volunteer to translate pages.
Review pull requests made by contributors promptly.
Review pull requests generated by reactjs-translation-bot in order to make sure translations remain up to date with the source repo.
Set up a process in order to get your translation published. See the tips below for suggestions.
Act as point of contact for your language and answer questions from both contributors to your language and the core ReactJS team.
Tips
Make a glossary and style guide
Create a glossary of the translations of technical and React-specific terms. Put this in a highly visible location (the README or a pinned issue). For examples of glossaries, see:
Also, create a style guide to define additional rules to follow in translation. See the universal style guide for rules that should apply to all translations.
Instead of assigning a long page to one translator, you can create a “Work In Progress” (WIP) branch and assign different sections to different translators.
Setup a review process
Decide how many reviewers will review each translated page before it can be merged in. Small teams may only be able to have one reviewer while bigger teams may consider having two reviewers for a stronger guarantee that the page is correct.
GitHub tags can show what step in the review process a PR is in. The Brazilian Portuguese repo includes a good example of a tag system.
Integration tools
Integration tools are encouraged to test and deploy your translations. Some tools used by various React translations:
If a question isn’t addressed here, the repo maintainers can go to the global ReactJS localization team and ask their fellow translators for help! They can also ask for help in the Slack channel.
Just to summarize the ideas on David’s original 3 questions based on the thread so far (feel free to correct me):
WHAT CONTENT: tutorials + docs WHICH LANGUAGES: Most-spoken vs. Least-likely-to-have-ESL-speakers (Either way, should start with a very small handful)
As to the HOW… I’m seeing some patterns across the good practice examples :
Use English as base language
Assign translation roles with dedicated maintainers for each language
Use of bots for checking for staleness against English (Vue + React)
I don’t know much yet about Rob’s framework used in redwoodjs.com and what the optimal way to dish out all these multiplying markdown files would be (ex: redwoodjs.com/docs/zh vs subdomain docs.redwoodjs.com/zh etc…), which would help point us in a direction for repo structure… your insights wanted!
It’d be really cool if we could devise a workflow that uses gitlocalize, for the ease of maintainability out of the box nicely summarized by @Thieffen . Plus I confirmed it allows auto-translation that you can manually edit, per @ajcwebdev 's point about machine aided translation:
However, the limitation with gitlocalize is that it seems to be set up for creating PRs to the repo where the source content lives (like learnstorybook.com scenario). If we wanted to do separate repos by lang (like React docs scenario), we would probably need to set up our own sync environment (there’s precedence with the React docs translation bot).
I lied - apparently gitlocalize allows you to specify a target repo(s) for translations, so translations could live outside the main repo if desired. Just keeping us aware of tools at hand
Here I’m testing things out in same-repo, but you can see the field allows for any external repo
Might Netlify country/language redirects be helpful here?
if a user in Israel with Hebrew language preference visits / , they’ll get redirected directly to /israel/he in one step. Our cache server will cache this redirect for any other users that would match the same country and language rules.
This is a cool feature to know about. However, as someone mentioned in discord it is nice (especially for people straddling multiple regions/languages) to have explicit control over which language of docs to see.
For example, I lived in in Japan for a long time and my browser is still set to ja, but sometimes I just wanna see the English docs without being redirected to ja
So I’ve been playing with the redwoodjs.com code just to learn how docs are currently generated.
Just for kicks I’m trying it out with this setup:
using gitlocalize to manage ja and es translations (see repo here)
Made code tweaks to add ja-docs and es-docs routes on branch language-sandbox (see forked branch here)
You can see what specifically was changed in this PR (everything in code/html was auto-generated on yarn build)
I managed to get it to generate some of the translations from gitlocalize with a couple caveats
I haven’t figured out whether the current setup in build.js and docutron.js allow for creating nested html folders. So for now playing with one-level nested “book” names like /ja-docs/... and /es-docs/... as opposed to docs/ja/... etc
Current setup requires manual addition of new pages in the build process. Two bonuses here though: 1) you can localize slugs by specifying a title in the SECTIONS object’s file object, 2) you can ‘cherry pick’ which approved translations make the final cut for the public
Took me a while to figure out the code/html directory needs you to manually add a new directory in order to create new books (would be great to make some kinda mkdir -p functionality for the build process when creating html files if we go this route.) Also, the build process only adds files, doesn’t sync to remove an html file when the source md file is removed.
Haven’t bothered coding any lang specific navigation so only manual url navigation for this test
Slugs below come from build.jsSECTIONS object’s files: [{ title}]
(sí, there is a typo in the spanish ;))
Swear I’ll stop spamming the forum channel on discord today - last one
After re-reading this thread I get the feel that the consensus is to prioritize the tutorials over the docs. Just pretend the above are tutorial pages
Also thinking we could have a totally separate standalone repo for organizing translations of the youtube videos transcripts. That translation is for a different platform and could easily be set up in a new ‘redwoodjs.com-tutorial-video-i18n’ repo with subfolders for different languages. The videos are less likely to be edited and don’t need staleness monitoring until new vids are added. I’d be happy to set up this youtube translations repo if you guys feel it’s the right approach