Setting up a content management system (CMS) for localization requires both a database for supporting multilingual content and mechanisms for interfacing with your language service providers (LSPs).
Success with these will ensure an efficient flow of content between your content management system and your LSPs.
Unicode and XML
Unicode is the foundation for encoding languages. XML facilitates transfers of source and content between your CMS and your LSPs.
Unicode – Extending your monolingual CMS to multilingual capabilities requires a representation for encoding all your current and future language needs. Unicode is the standard for handling multilingual content, and your database must support it.
XML – XML has many uses, such as structuring source and translated documents to facilitate the flow of content handoffs to and from your LSPs. An overarching goal in designing your XML is to maximize leverage of your language assets.
- As with other types of markup, consistency, clarity, and logical names and structures are important to maintain readability and processing by you and your LSPs. Any changes to your markup should be communicated to your LSPs. Define processes for your LSPs to be notified automatically when changes to your XML have been made.
- A critical design criterion is to specify markup of translatable content from those parts that are non-translatable. Whether you put translatable and non-translatable content in different files or use different attributes, it is imperative to identify the two types of content. By doing so, you will reduce the burden on your LSPs in determining what is to be translated, thereby shaving translation costs by reducing the number of words to be translated.
- The granularity of representation for your XML markup is a tradeoff. Coarse-grained representations maximize leverage of your translation management system (TMS) at the segment or sentence level. Finer-grained representations at sub-segment levels include phrases that appear frequently and can be translated once, much in the same way that a TMS is used.
- Identify “do not translate” (DNT) terms including product names, trademarks, and servicemarks. By doing so, you will avoid needless translations and corrections downstream. There are also legal issues when product names and trade- and service- marks are translated.
- Terminology is critical to localization. It must be managed at the source. Terminology can have special translations, so use markup to identify terms requiring a special or unique translation.
- Your XML should be consistently defined across all your documents and to your LSPs. By minimizing complexity of your XML, you will improve translation flow and reduce parsing errors that impede the translation process.
- As your content grows, your XML representations will need updates with new attributes and values. To ensure consistent representations, implement approval processes or mechanisms. Any changes to your XML, such as new fields, should be communicated to your LSPs.
- In addition to fields for document and source content identifiers, separate fields are defined for translations. These will include language identifiers or metadata for the target language and value placeholders for the translations. Treat translations as values.
- Every document should be given a unique document identifier. This identifier will be used for identifying and storing translations of a particular document in your CMS.
Look for the next part of this two-part article – “Effective hand-off of content to an LSP”
Have questions about what you’ve read in this article?
Contact us at 831-655-7500 or
Email: [email protected]
Click Here for Your Free Report: “Web Localization – Taking Your Website Global”
Learn the fast, efficient, and most effective way to translate your website
and online content into additional languages.