This is the second part of Website localization. To check the first part please click here.
In this part we are going to concentrate on some technical aspects of website localization.
Technical aspects of website localization
When we are browsing a website, there are a number of files in electronic format. The site might be based on html, but there can be other web technology files, document files such as pdf, txt or doc, multimedia elements, files to download such as zip, applications, other documents etc. All these have to be considered when localizing a website.
The majority of dynamic websites are database-derived and are usually based on a combination of html for the static elements, as well as scripts to request and retrieve data from a database and other scripts to generate either html or xml-based sites to display the new content. These scripts can be run either on the server side or on the client side.
Let’s go through static website and check the main HTML file tags and how can we deal with it during localization
HTML: Stands for Hypertext Markup Language and it was one of the first languages developed to create web pages.
Html is special kind of text document with tags that is used by web browsers to present text and graphics. Beside HTML we can use CSS “Cascading Style Sheet” to apply some styles on HTML elements.
Basically, HTML is formed by pre-defined tags, that is, all web pages written in HTML must use the tags described in the html specification. These tags can be opening tags or closing tags. Opening tags are formed by a “less than” symbol, the name of the tag, for instance html, and the “greater than” symbol. Closing tags have the same structure but include a slash after the “less than” symbol.
Website localization challenges and technical issues:
– Head and Body
The elements that have to be localized are found either in the head-element <head> </head> or in the body-element <body> </body>. Both head and body will contain external tags forming an element that will contain text in between, internal tags such as new line or bold format <b></b>, that might need to be moved or edited during the translation process, and tags with translatable attributes, such as metadata or other tags such as <img> to include images or <a> to include a link. In the document head, we might find metadata that contains valuable information to index or identify the web site.
This is possible thanks to the element <meta>, in which different properties can be specified through the use of the attributes name or http-equiv, such as the author of the document, a list of key words etc. While text to be localized is usually included between two tags, that is, inside an element, these metadata is included inside a tag. The attribute “lang” can be used to specify the language of the content of that metadata.
– Tag Protection
Although as we have seen there are some tags that include attributes that need to be localized and some tags that might be changed or removed during the translation process, generally speaking it is important to protect tags either in the engineering phase or during the translation process to avoid translators corrupt the code by mistake.
– Text in graphics
When localizing a website, we might also find graphics, which can be either static or moving. If these graphics contain text overlays, there might arise problems. To avoid them, graphics should be designed in the engineering phases so that they have a separate, editable layer that can be sent to localization. Besides, here it is important to take into account text expansion, since the translation in some languages might not fit in the graphic.
Moving graphics are either animated GIF files or Flash graphics. GIF Files are like films with multiple frames. Here, textual content in each frame needs to be identified and localized in an appropriate tool. Flash graphics employs vector technology to create animated graphics. Here, the source FLA files are necessary for the localization process, since the executable Flash files, EXE or SWF, are difficult to access.
– Hard-coded text
Text inside tags, cannot be easily accessed for translation and therefore will remain in the source language or we should ask the dev. Team to extract the text from the code to translate and send them back to dev. team to insert back to the code, but this is time consumed process.
– Hard-code fonts
If fonts are hard-coded they cannot be changed. This can be especially critical when a certain font does not support all of the characters used in the target language.
– Character Encoding
It is necessary to define in a metatag which character encoding is going to be used to display the text. This is done with the “charset” attribute, which looks like this <META HTTP-EQUIV=”Content-Type” CONTENT=”text/html;CHARSET=ISO 8859-1″> for the western European languages.
The character set will need to be changed accordingly, depending on the target languages.
– Double-byte enablement and bi-directional languages
Asian languages that need more than one byte to represent all their characters, and some languages such as Arabic are read from right to left. This has to be taken into account when designing the web site.
– Text expansion
Text in graphics or in tables might not fit in the localized version. Romanic languages, for instance, tend to expand when translated from English. The general rule is that large sections of text expand by about 30%, whereas single words and terms, depending on the language, can expand by as much as 400%. Therefore, dialog boxes and field lengths have to be designed properly to allow for sufficient space.
– Local-specific content
From the technical point of view, there are some other local-specific issues that have to be considered. It is advisable not to hard code all this information, but to use the system settings of the user’s environment. That way no matter who accesses the site, the information will automatically appear in the proper format for that particular user.
This is the end of the second part of Website Localization. In the third part we’ll talk about language and culture aspects of Website localization.