At Weglot, one of our main features is to supply our customers with a first automatic translation layer when they have new content to translate. It’s a great way for them to not start their translation work from scratch and to save a lot of time. To provide those translations, we rely on several Translation providers and we send tens of thousands of API requests to those Translation providers on a daily basis. So the APIs performance is an everyday priority for us.
As the quality of our service relies on those third parties applications for machine translation, it’s quite important for us to know the strengths and weaknesses of each of them.
Today, you can already find various articles and benchmarks online about the translation quality of those providers but not so much about performance, so let’s crunch numbers!
In order to ensure accurate results and fairness between the different third parties, we wrote a script that measures average response time under the same environment for each translation provider services, meaning that we would use the same language pair (obviously) and the same dictionary. The average response time is computed out of 100 requests.
What translation providers do we use
Weglot uses four different translation providers to cover all the language pairs that we support. Here are the services we will compare today.
Bing Microsoft Translator
The first version of the Bing Microsoft Translator software was released between 1999 and 2000. In November 2016, they introduced translation using deep neural networks for nine of its highest traffic languages. The service now supports more than 60 languages.
The Google Translate service was launched in April 2006, and their neural translation service in November 2016. It supports more than 100 languages.
Yandex Translate, developed by the Russian search engine of the same name, was launched in 2011 and only 3 languages were available. Now, they support more than 90 languages.
DeepL is the most recent provider, launched in 2017 by Linguee and it supports 9 languages. It uses convolutional neural networks built on the Linguee database.
Our Translation providers Performance Benchmark
Now that you know everything about the Translation providers we use at Weglot, let’s dive into our benchmark!
Benchmark #1: The French to English pair
For our first run, we wanted to keep it simple. The results would serve as a baseline to compare latter benchmarks.
We ran a batch of requests for the French to English pair with a payload size of 100 characters:
- Bing is clearly the winner here with an average response time of 193 ms
- DeepL is behind (almost 3 times slower than Bing: 577 ms).
- Yandex and Google are respectively ranked 2nd and 3rd with a response time of 355 and 418 ms.
Benchmark #2: The most spoken language pairs
Moving on, we realized that not everybody speaks French (shocking news I know) so what about the most spoken languages in the world?
This time, we run a benchmark with English as the source language and Spanish, Chinese (Mandarin), Arabic, Russian, Hindi, Portuguese, French and German (not technically one of the most spoken language but quite used in Europe) as the target languages.
Results here do not include DeepL which does not support Chinese or Arabic (at the time this article was written).
- Bing is still the fastest performer and is only beaten once by Google for English to German translations.
- The graphic curves of Bing and Google results look alike except for the English to Hindu pair where Google’s response time peaks.
It clearly shows the use of a pivot language here, as sometimes, a provider cannot translate directly from one language to another, so what they do, is translate into an other language first from which they can translate to the requested target language.
As expected it takes much more time to process the translation.
It should be noted that Yandex is very consistent and offers almost identical response times for any language pair even for English to Chinese where Bing and Google takes significantly longer.
Benchmark #3: But what about more diverse languages?
So far, the tests are relevant if you want to translate your content to popular languages.
With over 60 languages supported (except for DeepL), Bing, Yandex and Google allow you to reach a surprisingly large target.
This time, we will focus on less popular languages and see how providers perform for English to Greek, Sweden, Czech, Serbian, Dutch, Romanian and Swahili pairs.
- First thing to notice, it takes significantly longer to translate for any providers except Yandex which again, is really regular no matter the language pair.
- With an average of 127 ms (against 104 ms for most spoken languages), it takes Bing 22% more time to translate into less spoken languages.
- Google takes 27% more time (235 ms against 185) while Yandex only needs 2,41% more time
Overall there is not much difference here, translators are ranked in the same order as for most spoken languages.
Benchmark #4: Performance depending on the size of the translation request
So, we’ve analyzed the providers’ performance depending on different language pairs. Another significant factor when it comes to translator performance is the size of the payload.
Some of you may have noticed that according to the first graphic, Yandex is supposed to be faster than Google whereas the following benchmarks always show Yandex to be slower.
This is due to the payload size, the first benchmark used a dictionary size of 100 characters against 20 for the next ones.
Let’s see how providers behave when we increase the payload starting with 20, 100, 500 and finally 2500 characters for the English to Spanish pair.
As the results for 20 to 100 chars are not really readable, I’ll also dump the raw figures to help us figuring out what is happening.
As you can see, Google more than doubles it’s response time when we jump from 20 to 100 characters (+ 107%) while Yandex ’s response time is increased by 45%.
This explains why Google often seems faster than Yandex, it benefits from the low sized payload in most of the benchmark parameters.
Other than that, this graph shows interesting behaviors in translation providers:
While being the overall slowest translator, DeepL response time caps at 1,75 s for 500 characters. Above that, the response time is pretty much the same (up to 2500 chars at least).
On the contrary, Yandex is almost stable until 100 characters, after that, its response time rise to almost 3.5s!
Google and Bing, on the other hand are the most regular providers with Google being the most linear.
As you can see, there is no clear winner in this benchmark, each provider has its strengths and weakness.
While being the fastest, Bing has no official sdk and its documentation could be more friendly for non technical users.
Google, on the other hand, may be a bit slower but has a great documentation and a greater dashboard to manage your account.
If you need to translate many different languages including some unusual ones, Yandex may be a good fit as it will provide you with consistent response time no matter the source and target languages.
If performance is not an issue (say you’re not serving translated content to your users in real time), DeepL must be considered as it’s arguably a challenger when it comes to quality and it can handle large payload with no extra process time which can make it a good fit for translating large files or any queued process.
Disclaimer: Tests have been performed from AWS eu-central-1 instance in Frankfurt