Almost everybody thinks, lets translate and we can operate in another country. Many even believe, let a machine do it and we are good – among these are high officials in the European Union, arguing this more out of avoiding a problem than addressing it. Actually the recent events show we’re not very successful in extracting information from unstructured data, not even monolingually.
Structuring Unstructured Information
A text, such as this one, is an unstructured information resource. However, the text must be first found and it is more valuable if combined and understood together with more such texts. Then conclusions can be made and actions taken based on the information inside the texts. This has been done for centuries, for example by lawyers and courts in countries that work with case based law. Already for a while, the multilingual angle is becoming increasingly important due to international trade, globalization, social media, and in Europe due to the central organization of the European Commission governing so many diversified countries.
Everyone knows from painful searches that a smart organization of filed documents is key for finding anything back. The best strategy for retrieving information was the library hierarchy and its terms used to organize the resources. This proven approach has been neglected in the digital age, because users were told search, folders, and titles will do the job. Today, most organizations barely know or care whether terms are used correctly in authoring or in translation. Therefore their texts are no longer useful when queried to support decision making. This is even more evident cross-border when imperfect translation, especially when automated, not only loses terms but also introduces errors.
Interoperability by Machine Learning or Linked Open Data?
Eager researches and software companies will say: never mind, Big Data methods using Machine Learning or Linked Open Data will come to rescue. Indeed, both are great for gisting. But they fail to function reliably beyond that point.
Imagine asking them what is 1 + 1. Machine Learning will tell you the answer is in the range of 1.5 to 2.5. Linked Open Data will tell you that the answer might be in the following 42 documents.
Ensuring Cross-border Interoperability
We seem to have forgotten that library science such as classification systems, taxonomy hierarchies, and thesauri are the core for reuse of textual data. When these knowledge resources are multilingual they become a Multilingual Knowledge System. An MKS can extract insights even of texts in and across multiple languages.
I am not saying terminology is the answer – terms are mostly flat, unrelated and mostly compiled for translation support. Instead we need a structure to give us the context and to be able to drill through a concept map to find relationships. The term resources are rather an asset that can be levitated to become a knowledge structure.
Multilingual Knowledge Systems provide cross-border data processing possibilities, often called semantic or information interoperability. Actually MKS’s are the only possible path to achieve cross-border interoperability. They retrieve the needle in the haystack – in all languages. They make it possible to pinpoint the units sought for while linking to all information related to that unit. And if an MKS is supporting Big Data and Linked Open Data these technologies will also efficiently support cross-border interoperability!
Author Gudrun Magnusdottir @GMagnusdottir