The ultimate linguistic guide to software localisation for developers

There are lots of great guides out there for how to prep your product for internationalisation and localisation from an engineering perspective. Building software localisation into your product right from the start – even if you’re not ready to expand beyond one locale just yet – saves you a tonne of work and headaches down the line.

The effects of software localisation cascade down to every aspect of development and post-development, from UX and interface design to the basic engineering and core functionality of your product, and to documentation, support and marketing. With this in mind, getting a good grounding in the repercussions that designing for different locales has for the development process is a great idea for any software developer.

We’ll start by explaining some basic concepts. Then, we’ll look at examples of strings from different languages and explore the requirements that different locales have. Throughout this post, we’ll refer to our fictional app “SuperApp” in our examples.

Locales vs language variants

It might be helpful to start by looking at what we mean by a locale. This is a term used both in the tech and translation industries to refer to a country-specific variant of a language. If you’re not from a multilingual background, you’d be forgiven for thinking that it’s sufficient to think about languages such as English, Spanish and Swedish. If we want to make SuperApp available in one of these languages, surely it’s enough to translate the strings and be done with it?

The thing is, “language” is a fuzzy term and nowhere near granular enough for our needs. Let’s start with English. It’s spoken natively by over 400 million people and is an official language in 55 sovereign states – a group of countries commonly referred to as the Anglosphere. The language isn’t uniform across the Anglosphere: there are dozens of national varieties, each with their own conventions for things like pronunciation, grammar and spelling standards, and even how dates and numbers are formatted.

You’re more than likely already familiar with the two biggest varieties: British English and American English. These national standards can be expressed with the IETF language tags en-GB and en-US, respectively. The story is similar (albeit on a much smaller scale) for languages like Swedish – which is an official language in both Sweden (se‑SV) and Finland (se‑FI).

But is this a locale? Well, not quite. The tags above refer to the language variant only and do not include the user’s selected region settings. Region settings affect things such as how the date and time is expressed (e.g. ‘31 December’ being written as 31/12 or 12/31, and whether to use 12- or 24-hour clock by default), how numbers are formatted (e.g. using a dot or a comma as the decimal separator) and where currency symbols are placed (e.g. before or after the amount, with or without a space). If we bundle these region settings up with the language variety, then we get our locale.

On most operating systems, users can independently select their interface language and preferred region settings, meaning they can end up with locales that don’t necessarily align with national language variants. For example, many Icelanders use their computers in English but with their region set to Iceland. This locale would be expressed as en_IS (note the use of an underscore as opposed to a hyphen).

Although it’s important to understand the distinction between language variants and locales, thankfully, the hard work of accounting for all the different date and number formats is done for you on most platforms. Apple, for example, provides a wide range of formatters that adjust things like the decimal indicator and date format automatically for the user’s selected region settings, even if those region settings don’t correspond to the interface language.

One final consideration is the aspect of hierarchy when it comes to language variants. Your app may only support one broad variety of English (en) or Spanish (es), for example, rather than country-specific variants. Even though you don’t support their local variant, most users will still prefer to use the broad international or regional variant of their language rather than a different language altogether.

Let’s take Spanish as an example. Most often, software is localised into Peninsular Spanish (the variety spoken in Spain) first. This national standard also acts as the ‘broad’ variety and would sit at the top of the hierarchy, designated es. Now that we’ve made SuperApp available in Spanish, we have decided to offer a more tailored experience for our Latin American users by supporting their regional language variant, which is designated es‑419. Going further, we’ve decided to offer our Mexican users an even more localised experience and translate our strings into Mexican Spanish, meaning we end up with an es‑MX variant as well. If a user’s preferred variant is not available, then they can cascade back up the list until they find their closest preferred language variant.

Things to consider when writing strings for software localisation

Now that we’ve got a firm grip on locales, we can take a look at the ramifications of software localisation when it comes to writing and concatenating (or segmenting) your software strings.

Numbers and dates

We’ve already briefly touched on the subject, so we should probably get this out of the way early. In almost all situations, there is essentially one golden rule to follow here: never hard-code date, time or number formats.

No matter what programming language and dev environment you’re using, there are fantastic date and number formatters available – either native or added through libraries such as Moment.js – that take care of all the hard work for you, returning perfectly localised dates and numbers that respect your users’ region settings. The best advice here is to rely solidly on these and save yourself a world of trouble.

Word endings

In English, there are relatively few word endings (inflections) to consider. The vast majority of nouns are made plural by adding an -s or -es. When it comes to most verbs, we have only two forms in the present tense, for example, sings and sing. However, many languages have a greater variety of endings than English, and these can affect more classes of words than nouns and verbs; for example, many languages also inflect adjectives. The distribution can also vary by language: some languages, particularly Scandinavian ones, have less inflection than English on verbs but more on adjectives.

Let’s take this example from Norwegian:

Där finns 1 rum ledigt på denne prisen.
There is 1 room available at this price.

Där finns 10 rum lediga på denne prisen.
There are 10 rooms available at this price.

Here we can see that the verb finns is the same in both sentences, whereas in English we have two different forms, is and are. On the other hand, the adjective has changed: in the first sentence, it is singular (ledigt), and in the second, it is plural (lediga).

This affects how we concatenate our strings. As a general rule, it’s always best to avoid chopping strings up wherever you can. The translator will be able to offer a better-quality translation if we leave the string as intact as possible. Another reason for this, as we’ll see below, is that word order can vary hugely between languages, so we should never assume that, for example, numbers will occur in the same position in the sentence.

Plurals

In the Swedish example above, we saw how word endings can change between singular and plural forms. In the Scandinavian languages and Finnish, we only have to worry about a singular and non-singular form. For other languages, the situation is slightly more complex. Let’s take an example from Icelandic:

1 bíll fannst á þessu verði í nágrenninu.
1 car was found at this price nearby.

12 bílar fundust á þessu virði í nágrenninu.
12 cars were found at this price nearby.

21 bíll fannst á þessu virði í nágrenninu.
21 cars were found at this price nearby.

The first two sentences in this example show the same singular–plural distinction we’ve seen so far: when the number is more than 1, there is a different ending for the word. The singular word is bíll “car”, and the plural word is bílar “cars”. However, Icelandic also requires numbers ending in -1 (with the exception of 11) to use the singular form, whereas other languages, including English, might have the plural form. This is because of the way the number is constructed in Icelandic: 21 expands to tuttugu og einn “twenty and one”, so we’re literally saying “twenty and one car”. This is something we need to take into consideration in our logic when deciding which form of a string to serve up in Icelandic.

In the Slavic languages, we have to consider a different, even more complex set of rules. In Polish, for example, there are three possible forms to choose from, depending on the number used:

A singular form (e.g. samochód “car”);
A form used with 2, 3 and 4, and any numbers ending in -2, -3 or -4, except for 12, 13 and 14 (samochody);
A form used with all other numbers (samochodów).

In JavaScript, we could express this rule as follows:

function returnPolishForm(i) {
  var form = 'genPlural'; // Our default form
	var lastDigit = i.toString().slice(-1);
	if(i==1) {
		form = 'singular'; // If i is 1
	} else {
  	if (lastDigit >= 2 && lastDigit <=4) {
			form = 'plural'; // If i ends in -2, -3, -4 and is not 12, 13, 14
      if(i >= 12 && i <=14) {
        form = 'genPlural'; // If i is 12, 13, 14
      }
		} else {
    	form = 'genPlural'; // All other numbers
    }
  }
  return form;
}

Let’s take the example we used for Icelandic from above and apply it to Polish:

W okolicy znaleziono 1 samochód w tej cenie.
1 car was found at this price nearby.

W okolicy znaleziono 2 samochody w tej cenie.
2 cars were found at this price nearby.

W okolicy znaleziono 5 samochodów w tej cenie.
5 cars were found at this price nearby.

W okolicy znaleziono 23 samochody w tej cenie.
23 cars were found at this price nearby.

W okolicy znaleziono 25 samochodów w tej cenie.
25 cars were found at this price nearby.

Note how the word for “car” changes with the number. To serve the correct form of the string to the user, we need to add some logic that is specific to Polish. If we don’t do this, then we’ll introduce a grammatical error that, in the best case, detracts from the user’s experience and, in the worst case, creates a severe misunderstanding.

Gender

Many languages have a feature called grammatical gender. These are essentially classes of nouns that inflect in a similar way. While they may be labelled masculine, feminine or neuter, a word’s grammatical gender doesn’t always align with its natural gender. In German, for example, the word for “girl”, Mädchen, is neuter. Gender doesn’t only affect nouns, though; it has knock-on effects on adjective endings and pronouns as well.

Pronouns

In English, we use the neuter pronoun it to refer to inanimate objects. A typical string in SuperApp might look something like this:

This document is over 50 MB in size. Would you like to send it anyway?

In Icelandic, this would be:

Þetta skjal (n.) er yfir 50 MB að stærð. Viltu senda það (n.) samt?

The word for ‘document’, skjal, is grammatically neuter (n.). As a programmer, it may be tempting to split this message into two strings, as we have two sentences. Then, if we need to swap out the first string, say, to refer to a photo instead of a document, we can just concatenate them at runtime. However, if we change ‘document’ to ‘photo’ here, we get an ungrammatical construction in Icelandic (indicated by the asterisk):

Þessi mynd (f.) er yfir 50 MB að stærð. Viltu senda *það (n.) samt?

The problem stems from the fact that mynd is feminine (f.), but það is neuter. This means that the gender doesn’t agree, making this pair of sentences ungrammatical. Instead of það, we should have the feminine pronoun hana (literally ‘she’), which refers back to mynd. The better solution then is to keep these sentences together in one string and allow the linguist to translate it as one block.

Adjectives

Gender also affects how we address users. In English, particularly in user interfaces, we tend to see a lot of structures like this:

Are you sure you want to delete this folder?
Are you ready to turn on your camera and microphone?

These kinds of sentences work great in English regardless of the gender and number of the people we’re addressing. However, in languages such as Spanish that mark gender on adjectives, we need to account for feminine and masculine forms in order to be inclusive:

¿Estás seguro/segura que quieres eliminar esta carpeta? 
¿Estás listo/lista para encender tu cámara y tu micrófono?

In the first example, the translator can solve the problem somewhat creatively by rephrasing it to ¿Seguro que quieres eliminar esta carpeta?, which can be translated as ‘Is it certain that you want to delete this folder?’. This construction avoids addressing the user directly with an adjective.

However, the second phrase is more challenging to rework without addressing the user directly, so here we need to include both the masculine listo and the feminine lista to avoid excluding female users.

When writing strings, it’s good practice to avoid addressing the user directly with adjectives if you can help it. While a good translator will always find a solution, sometimes it might not be as neat as in English, and it could use more characters and subsequently take up more space in the UI.

Text expansion and contraction

As we’ve seen above, translation can drastically alter the length of software strings. Some languages require more words or characters to express the same meaning as in English, whereas others may require fewer. Averages published by IBM show the number of characters in a string may increase by up to 200%, and that this is most likely to happen in the shortest strings, typically those below 10 characters. French, Italian and Spanish are all languages that see character expansions in this range. For the Nordic languages, your strings may actually contract in certain contexts as well. For example:

	String	Character count	Expansion
English	3 photos were deleted from the album “New York”.	48	–
French	3 photos ont été supprimées du album « New York ».	50	+4%
Spanish	Se eliminaron 3 fotos del álbum “Nueva York”.	45	-6%
Danish	3 fotos blev slettet fra albummet “New York”.	45	-6%
Finnish	Albumista ”New York” poistettiin 3 valokuvaa.	45	-6%
Icelandic	3 myndum var eytt úr safninu „New York“.	40	-13%
Norwegian	3 bilder ble slettet fra albumet «New York».	44	-2%
Swedish	3 bilder har tagits bort från albumet ”New York”.	49	+2%

Another thing to note from the example phrases here is how the word order can vary from language to language. Notice how in Spanish, the verb comes at the start of the sentence, and our photo count is pushed further down. In Finnish, the album name is pushed up to the top of the sentence, directly following albumista ‘from the album’.

Also, note how the punctuation varies from language to language. Each has slightly different conventions for things like speech marks. English uses “ ”, whereas Icelandic uses „ “ and French uses guillemets « » (with a space on either side of the enclosed word).

For this reason, we should avoid syntax like this:

var string = photoCount.' '
             .photosWereDeleteFromAlbumString
             .'“'.albumName.'”.';

The preferred syntax would contain placeholders that the linguist is free to move at will, which you can then replace with variables at runtime:

// English
'{photoCount} photos were deleted from the album “{albumName}”.'
// Finnish
'Albumista ”{albumName}” poistettiin {photoCount} valokuvaa.'

Note that the above examples don’t account for singular–plural distinctions – further logic is required to accommodate for those.

Context is key for software localisation

The thing that perhaps best equips a linguist to be able to translate your strings successfully is adequate context. Knowing when and where a string appears enables the translator to make a whole range of linguistic decisions and ultimately provide a correct, high-quality and consistent localisation of your software.

We recommend sticking to these guiding principles:

1. Get your product into the hands of your translators

It’s crucial to loop translators into your development process early. Even if you’ve not yet delivered your first public release, it’s vital that linguists understand your app’s purpose and how your UI is laid out. Giving them access to pre-release versions means you save yourself from future headaches and endless rounds of feedback and feedback implementation.

2. Provide local context

Software strings can be as short as one word. They might consist of a single verb: ‘delete’, for example. But is this verb functioning as an imperative (giving a command) or just as an infinitive (the dictionary form of the verb)? In English, they look the same, but that’s not necessarily the case in other languages. To enable the translator to make the right choice, give them access to view surrounding strings even if they’ve already been translated, or even better, provide screenshots. Some tools can automate this process for you.

3. Give your translators access to other translations

If you’ve already localised into several languages or variants, giving translators access to those can make a world of difference, especially for closely related languages. For example, if you’ve already localised into Swedish and are now adding Danish and Norwegian, giving your translators access to the Swedish strings in a translation memory will help answer a lot of questions they’ll have and may even allow them to recycle some existing translation solutions.

4. Keep an open line of communication

Translators are used to surmising the meaning of a text from the context they have available, but sometimes they just don’t have the key information to hand that would allow them to choose the right translation. Be receptive to translator queries and respond with as much information and context as you can.

5. Be open to adapting your product

It’s impossible for any one developer to account for all of the nuances of every language variant they might want to localise into. Leverage the linguistic expertise of your translators to improve how you write, segment and concatenate your strings. For example, you might need to account for a different word order than you anticipated, or you might need to adapt your logic to account for different word endings. Linguists can advise you on what works and what doesn’t for their language.

We’ve covered a lot of ground in this post, but there’s always more that could be said. The main thing to take away is to approach software localisation with an open mind. Be prepared to give and receive feedback, adapt and iterate as you go, and take advantage of your translators’ linguistic expertise to deliver the best UX in your target locale.

Many developers are rightly wary about the software localisation process. After all, you’re essentially entrusting somebody else to deliver your core user experience in a specific market. You want to make sure that you deliver on tone of voice, brand values and naturalness, not just having a grammatically correct translation. The key to this is a collaborative partnership and close, regular communication.

If you and your translators are all aligned around the same end goal of delivering a fantastic experience, and they’re armed with the tools to make that happen, you’ll reap the many benefits that software localisation has to offer.

This article was initially published in 2020 by Max Naylor, a former Sandberg team member, and has since been revised with updated data.

Editor’s Pick, Software localisation

The ultimate linguistic guide to software localisation for developers

Locales vs language variants

Things to consider when writing strings for software localisation

Numbers and dates

Word endings

Plurals

Gender

Pronouns

Adjectives

Text expansion and contraction

Context is key for software localisation

1. Get your product into the hands of your translators

2. Provide local context

3. Give your translators access to other translations

4. Keep an open line of communication

5. Be open to adapting your product

Related articles

How multilingual support documentation boosts your g...

How multilingual support documentation boosts your g...

UX localisation: how to design a digital experience ...

UX localisation: how to design a digital experience ...

E-commerce is booming in the Nordic markets

E-commerce is booming in the Nordic markets