Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,11 @@ User-Agent: YourApp/1.2.3
]
```

## Language codes

Language codes in the `lang` field follow [BCP 47](https://www.rfc-editor.org/rfc/rfc5646). The base language
subtag is always present; script, region, and variant subtags are included where needed to distinguish variants. See [Language codes follow BCP 47](/docs/resources/language-release-process#language-codes-follow-bcp-47) for details.

## Product features

Each language object includes a `features` array indicating which optional capabilities are supported for that language
Expand Down Expand Up @@ -222,7 +227,7 @@ The v3 language endpoints are designed to be forward-compatible:
- Existing fields will not be removed or changed in backwards-incompatible ways

<Info>
Build your integration to gracefully handle new values in the `features` array.
Build your integration to gracefully handle new BCP 47 `lang` codes and new values in the `features` array. Do not hardcode assumptions about the format of language codes -- see [Language codes follow BCP 47](/docs/resources/language-release-process#language-codes-follow-bcp-47) for details.
</Info>

## Best practices
Expand All @@ -231,6 +236,6 @@ The v3 language endpoints are designed to be forward-compatible:

2. **Check features**: Always check the `features` array on language objects rather than assuming support (e.g. for formality, glossary use, or writing style).

3. **Handle forward compatibility**: Ignore unknown values in the `features` array to remain compatible as new capabilities are added.
3. **Handle forward compatibility**: New languages and features may be added at any time. Build your integration to dynamically accept new `lang` codes and `features` values instead of maintaining a hardcoded allowlist.

4. **Use specific variants**: For target languages, prefer specific regional variants (e.g., `"en-US"`, `"en-GB"`) when the distinction matters to your users.
27 changes: 21 additions & 6 deletions docs/resources/language-release-process.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,34 @@ description: "Here's what API users can expect when DeepL adds translation suppo

On a regular basis, DeepL adds translation support for new languages or language variants. In this article, we describe the process we'll follow with a new language or variant release.

* We will add the language code for the newly supported language or variant to the “Source languages” and “Target languages” lists on the [Supported languages](/docs/getting-started/supported-languages) page in the API documentation. We’ll include a note on that page if the language or variant does *not* support both text and document translation.
* If a newly added language or variant supports both text and document translation, we will add the language or variant to the `/languages` endpoint response. Note that for language variants, we do not use a single, consistent format for the variant code:
* In some cases, a variant is primarily used in a specific region, and so a 2-letter region code is the best way to identify the variant. For English and Portuguese variants, we use a 2-letter region code (e.g. `EN-US`, `PT-BR`).
* In other cases, a variant is used widely across multiple regions, and so a single 2-letter region code isn’t the best way to identify the variant. For simplified and traditional Chinese variants, we use a variant code that is not region-specific (e.g. `ZH-HANS`, `ZH-HANT`). The decision of what variant code to use will depend on the characteristics of the variant itself, and variant codes will be selected by DeepL on a case-by-case basis.
## Language codes follow BCP 47

DeepL language codes follow [BCP 47](https://www.rfc-editor.org/rfc/rfc5646). A language code always includes a base language subtag (e.g. `en`, `zh`), and may include additional subtags for script, region, or variant where needed to distinguish variants. For example:

* `EN-US`, `PT-BR` -- region subtag to distinguish regional variants.
* `ZH-HANS`, `ZH-HANT` -- script subtag to distinguish writing systems.

BCP 47 is an expansive standard, and language codes can vary significantly in structure and length. As DeepL adds support for more languages and variants, new codes may use any combination of subtags permitted by the spec. For example, codes like `sr-Cyrl-RS` or `sr-Latn-RS` (Serbian in Cyrillic vs. Latin script, as used in Serbia) are valid BCP 47 codes -- while DeepL does not support these today, your integration should be able to handle codes of this form if they are added in the future.

<Warning>
**Do not hardcode assumptions about the format of language codes.** For example, do not assume that language codes will always be exactly two letters, or that a hyphenated code will always be in the format `xx-YY`. Instead, always treat the `lang` codes returned by the [/languages endpoint](/api-reference/languages) as opaque identifiers. If you need to parse language codes, use a BCP 47-compliant library rather than writing custom parsing logic -- the full spec includes subtags for script, region, variant, extensions, and private use, and partial implementations are a common source of bugs.
</Warning>

## What happens when a new language is released

* We will add the language code for the newly supported language or variant to the "Source languages" and "Target languages" lists on the [Supported languages](/docs/getting-started/supported-languages) page in the API documentation. We'll include a note on that page if the language or variant does *not* support both text and document translation.
* If a newly added language or variant supports both text and document translation, we will add the language or variant to the `/languages` endpoint response. The variant code used depends on the characteristics of the variant:
* In some cases, a variant is primarily used in a specific region, and so a region subtag is the best way to identify it (e.g. `EN-US`, `PT-BR`).
* In other cases, a variant is used widely across multiple regions, and so a script subtag is more appropriate (e.g. `ZH-HANS`, `ZH-HANT`). The subtag structure will be selected by DeepL on a case-by-case basis following BCP 47 conventions.
* In cases where a new language code with a variant duplicates the behavior of an existing language code without a variant (e.g. `ZH-HANS` was recently added as a language code for translating into simplified Chinese, along with `ZH`):
* In the `/languages` endpoint response, we will continue to return both language codes in two separate dicts with the same value in the `name` field.
* In the `/languages` endpoint response, we will continue to return both language codes in two separate dicts with the same value in the `"name"` field.
* For backwards compatibility, we will continue to support the original language code (in this example, `ZH`) for text and document translation.
* We will add the language code for the newly supported language or variant to our [OpenAPI spec](https://github.com/DeepLcom/openapi/).

<Info>
**Note about the**`/languages`**endpoint:** In the future, we plan to extend the language information returned by the API.

This will allow us to specify whether a language supports both text and document translation, whether a language code is considered deprecated because its been duplicated by a variant language code, and so on.
This will allow us to specify whether a language supports both text and document translation, whether a language code is considered deprecated because it's been duplicated by a variant language code, and so on.

The additional metadata would also allow us, for example, to add languages like `AR` and `ZH-HANT` to the languages endpoint even before document translation is supported.
</Info>