Massimo Ghislandi, EVP of Translation Productivity
Ciao. I'm Massimo from SDL, and today I'm here to talk about a translation memory.
So, what is a translation memory?
It's a language pair database that stores segments of text, which have been previously translated so they can be recalled for later use. I don't know how much sense this made to you, so we're going to try to explain all of this.
Few keywords here, database, hopefully you know what that is, but it's a place where things get stored. We are going to talk about segments, obviously text, and recalling, so bringing that back. And I'm explaining this, because a lot of the people in this industry are not English speakers, native English speakers, and so kind of just make sure that our terminology is as clear as possible.
So, let's start. What is a segment?
So this is a question that comes up, and especially if you are new to the industry, new to translation, you might actually not know what a segment is. You know, you might figure it out. So, in a way a segment can be different things and this is what gets stored inside the translation memory. It could be a sentence, and this is what you probably relate the most, with. It could be a phrase and actually that's not much difference from a sentence in a phrase, and I think it's bit semantics. It's kind of terminology, what word works best for you.
It could be a paragraph, and actually a paragraph is a little bit different than a sentence. Typically a paragraph could be a combination of sentences. And traditionally in a translation memory you store these two, sentences and phrases. You can also choose to store entire paragraphs, but obviously the more you store, the harder it is to match in the future. And we're going to look a little bit more in what that means, but remember that kind of, uh, thought of storing kind of sentences. So, after a full stop,after a semi-colon, for example, but perhaps if you have a very long paragraph with three phrases of sentences inside, you can store those, but you might decide not to do that.
Um, and headings, of course, a title, the beginning of a document, that is also what gets stored. So in essence, a segment is a finite pieces of, words, a set of words that gets stored into a translation memory.
The second piece is, what is a translate unit?
So, a translation memory is made up of translation units, and sometime people call this a tier translation unit. So what's a translation unit, it in essence, is the building block of a translation memory. So the translation memory stores the source text, so what you going to translate, and the translation of that sentence. So, typically if you look at a translation memory, you'll find many translation units, and each translation unit is going to be, "Hello. I'm Massimo. Ciao. So ne Massimo."
So you have the English and its translation. That's what a translation unit is. And what then is going to happen, is that the translation memory can search all this translation units, and as you translate it will find similar translations that you've made before for a specific sentence, and suggest them to you as translations.
So, you might wonder how does this help.
So, it kind of helps in many different ways. So, the classic is that you never have to translate the same sentence again, so there's different benefits to it. You might kind of see immediately what's- what's a benefit of not having to translate the sentence again. First of all, to retranslate the same things, it's boring, so you don't have to do it again. Sometimes you have to watch it, sometimes you'll have translated something and you do need to actually have a different translation because of different context. So you always need to check those translation, but in essence, when you translated something, you don't want to translate it again. It does save you time, but also it's about quality. You might have spent a lot of time translating a sentence and getting it just right, not just right for you, but also you might have a client, there might be somebody who reviewed that translation, and at the end of the process, everybody was 100% happy with that translation.
You want to make sure that that translation is the one that gets used again. So, it's about quality and consistency and making sure that the best translation is reused again and again.
Obviously, all of these translations get stored for future use, so you're building an asset in essence. You're building this database made out of translation units, which contain all your previous sentence that you translated, and it's something you build for future use. There are people who'd been building millions of words. It's not a joke. There's translators who have a million translation units. Can you imagine? And they can reuse them, so it can save them a lot of time and also ensuring consistency and quality.
One of the nice thing about a translation memory is the more you add, the more you can recall. And at the end of the day, the interesting piece is that you're actually not doing more work. You're translating as you were translating before. The difference is that you're storing that translation for the future, and it just kind of happens automatically. It's really quite easy to add and build a translation memory. It can get big pretty quickly. If you think that a typical translator translates 2000 words a day, that's what the industry sort of agrees on translating new words. Of course it can be a lot less than that. If you have a slogan, a marketing slogan to translate of ten words, it might actually take you half a day to translate that slogan. But, you know, typical content, commercial content in instruction manuals, technical documentation, the average is 2000 words a day. Those are 2000 words a day that you can store in the database for future use.
So, it does help you translate faster, because you have stored all of this, but I really want to stress there is- The-the speed of, that, that it offers you, is also very much about giving you more time for when you are translating that new content, so that you can create the best possible translation for it. And of course, it takes away the-the monotony of translating the same sentence again and again. For one marketing slogan, that could be one document where the same sentence is repeated hundreds of times, and, and so that's where a translation memory can really make a difference.
So, lots of information so far. Translation, segments, units and so on, but I think hopefully you got the concept of having your translations stored in translation units. But how does it all work?
So, once you get your new text to translate, a CAT tool, and we have a video on a CAT tool, is going to look at your entire database, look at all those translation units, and trying to find something that matches. And it happens instantly. It's very, very quick, so you're not sitting there waiting for the translation memory to find a translation. It's, it's pretty much instant. So, I just use a word match, and this is kind of a bit of the essence of a translation memory. What it's trying to do, is trying to match the new sentence you have to translate with the sentences you've translated before. And different things can happen.
So, there are four types of matches, so let's look at them. So the first one is context match. So this is the highest quality match. This is where something that you have translated before, but not only you translated that sentence, but the translation memory can recognize that you translate that sentence and the sentences around it. So in that case, the system can say, "Hmm, not only I translated this before, but what's around it is also the same." So, that's a context match, because the chances are that that is going to be a really good match.
The second match is the 100% match. This is something that you might hear quite a lot in this industry. So this is the more traditional match. It's something you translated before in the same format as well, you know, if something was bold, needs to also be bold or you might have to fix the formatting, for example. And so, that's the 100% match. In this case the context is not taken into consideration, so it's just looking at that sentence. And if what's around it is not the same, the system will just say, "This is a 100% match." Typically, you need to check a 100% match, because there might be surprises, and if any of you speaks different languages, masculine, feminine, maybe something was a heading, something was in the middle of a sentence, you might have actually need to use different translations and that's why the 100% match is going to be pretty accurate, but you do need to cast an eye on a 100% match.
Then we go down a level to what we call a fuzzy match, and in fact, the industry calls a fuzzy match. So that's something below a 100% match. Typically you might hear about 70%, 80%, 90%, 99%, 95% match. So that's clearly a sentence which is similar, but not identical. And the differences could be dif- you know, could be a word different, it could be a formatting difference, so various little things, which means that the translation memory would suggest that translation, and then you can fix that translation. Add a missing word, replace a word, restructure a little bit that sentence to make it work.
One of the newest matches is the fragment match. So, you know we talked about, paragraphs and sentences and so on. So within a sentence, there might be fragments, so you might have a sentence with ten words, but they might three or four words and those are the one that repeat themselves. So you might be translating an entire sentence, and you might have that feeling that you've translated those three or four words. A fragment match will look for those pieces and say, "You know, you actually did translate those four words, and here's how you did translate those four words." And then you can decide whether to use them or not.
You know what I said earlier, the kind of length of the sentence and the paragraphs, when you look at this, obviously the bigger the sentence, if you have a sentence that long with long, long, er-basically long, wordings, getting a match it might be a little bit harder. If you are in a situation where you have short sentences, it's easier to get a match. The, the statistics are kind of in-in your favor. So that's why, typically, what you store is sentences. Even if there's an entire paragraph with three sentences, you k-kind of store at the individual phrase, sentence level, rather than an entire paragraph. So you kind of give yourself a better chance to get a match.
So, one question that gets asked often, is, "Can I work with more than one translation memory at the same time?"
So why will people ask that? So, over the years I've seen all sorts of different approaches. People might store everything they translate, no matter what client, what- Is it, is it marketing, is it er-legal, and everything goes into translation memory and you kind of work from that. But other people create different translation memories for different clients, different type of content, but you might still want to be able to use them all, because you never know where a match might come from. So of course, you can create as many translation memories as you like, and use them all at the same time, and in fact, a CAT tool can be quite smart and prioritize one translation memory versus another.
So, if you've got a marketing translation memory and you're translating a marketing document, but you might also have the instruction manual from the same client, you might actually turn them both on, because you might actually find some matches from both, but the marketing translation memory's the one that has got top priority over, perhaps, the technical documentation.
When you're starting with translation memory, I told you that you can build it pretty fast, you know? You start translating, you're 2000 new words a day, and a translation memory will build up. But you might have been a translator for quite some time, and you might have a bunch of documents that you want to import in the translation memories. So, can you actually make a translation memory from your previously work?
And the answer is that you can. So, there is a tool called Alignment, is a tool it feature, a-a functionality that actually allows you to create a translation memory from your existing translations. So you might have a document in English, a document in Italian, and you translated them manually, you can take those documents and import them into a translation memory. And what you're doing, you're aligning them. You're trying to align the English sentences with the Italian sentences.
So, you're going to get with a side by side view where you see all these sentence, and you know, languages are tricky, so maybe three English sentences turn into a one long Italian sentence. Coming from Italy, I know. I used to have to chop my sentences more. When I sta- When I- when I moved to the UK, my sentences were that long. I had to really kind of force myself to be a bit shorter. So with alignment, you actually can sort of manipulate and try to align all the English with the Italian, in my case, or whatever language pair. One good point of translation memory, that it can store any language. There is no constraint. Any language can be imported, and worked with the translation memory. So, no matter what language combination you translating, everything can be imported in a translation memory.
So, hopefully you got a bit of a sense of what a translation memory is, what's inside. You hopefully knows what now is a segment, how all of that work together to give you suggestions as you translate. And one of the nice things of translation memory is that even if you're starting from scratch, either you build it pretty quickly, as soon as you start doing your first translation, or you could even import what you've, you're work that you've done before, so you create this asset that is going to last you for the future.