EU Projects

Below appear selected EU-funded projects relevant to the LT industry. If you know a project/tools/results that should be displayed here, contact us.

Overview of all current projects in the area of language technologies.

 

 

 

 

  META-NET, a Network of Excellence consisting of 60 research centres from 34 countries,

is dedicated to building the technological foundations of a multilingual European information society.
META-NET is forging META, the Multilingual Europe Technology Alliance. Its Strategic Research Agenda (SRA) is now available online.
 

The CASMACAT project will provide advanced Computer Aided Translation tools, research results on adaptive Statistical Machine Translation (SMT) and methodology for better integration of SMT in professional translation workflows. These results will be highly relevant for the translation industry, in particular professional translation companies.

 

MateCat integrates Statistical Machine Translation and Collaborative Translation Memories, within the Human Translation workflow. MateCat increases the productivity of professional translators and enhances their work experience with MT.

 

The ACCURAT project provided a whole processing pipeline for exploiting comparable corpora in MT. The open-source toolkit containing the necessary well-documented tools for comparable corpus acquisition and parallel sentence/phrase/term alignment and extraction is available from the project website.

 

Build your own machine translation system! With LetsMT! you can easily build and run your own custom machine translation systems. Simply upload your own corpora and/or choose to use any of the publicly available corpora. Train your systems and use them for all your translation needs.

 

The PANACEA project has produced a functional beta version of a service platform that allows to chain and combine language processing modules and tools to build "processing pipelines" for various value-adding tasks in multilingual content processing (e.g. annotation of language resources for later use in machine translation or text analytics systems).

 

Itranslate ITRANSLATE4.EU is a project gathering nine European SMEs active in the field of automatic translation to provide free web based automatic translation service, counterbalancing the predominant US offer in this domain.

 

EuroSentiment aims at creating a shared pool of shared language resources for fostering sentiment analysis, accessible by means of well-defined models and frameworks that leverage the promotion of SMEs in the emerging market of Sentiment Analysis products and services. The data pool will cover 6 languages: English, Catalan, German, Italian, Portuguese and Spanish, and will be validated through opinion mining demonstrators in two different domains. The targeted users are B2B including service developers, content providers and language resource owners. EuroSentiment will innovate providing a domain-oriented shared language resource based on WordNetDomains and aligned with WordNet Affect. 

MosesCore draws together academic and commercial partners sharing a common interest in open source machine translation. MosesCore will organise showcase events on MT all over the world - participation is free but spaces are limited.

 



EU-BRIDGE
an Integrated Project which will provide cloud based language tools including automatic transcription, translation and interpretation. The project will provide streaming technology that can convert speech from lectures, meetings, and telephone conversations into the text in another language.

 

The goal of Parlance is to develop personalised, mobile, interactive, hyper-local search. Recent trends in Information Retrieval are towards incremental, interactive search and spoken dialogue systems can provide a truly natural medium for this type of interactive search in particular for people on the move. Statistical machine learning technology developed on Parlance allows for greater robustness to recognition errors, automatic strategy optimisation and ability to continuously learn and adapt during the interaction.

 

The ORGANIC- "Self-Organized Recurrent Neural Learning for Language Processing" -  project transfers fundamental principles of biological brains to artificial cognitive architectures, with applications targeted in speech and handwriting recognition. In both domains, the project has generated solutions which reach or exceed the state of the art.

 

trans lectures transLectures will develop innovative, cost-effective solutions to produce accurate transcriptions and translations in the well-known VideoLectures.NET repository, with generality across other repositories based on the widely used Opencast Matterhorn platform.

 

Simple4All is creating speech synthesis technology that learns from data with little or no expert supervision and can continually improve itself through being used. This technology will allow to make synthetic voices which are more adaptable to specific domains or languages and make voices which can portray a wider range of expressions, including where traditional speech synthesis technologies cannot be used due to the lack of language resources and/or expertise.

 


The Multilingual Web project  organised four workshops that aimed to survey and share information about currently available best practices and standards that can help content creators and localizers address the needs of the multilingual Web, including the Semantic Web. They also provide an important opportunity to identify gaps that need to be addressed.

 

EASTIN-CLcreates an interface to databases containing information on assistive products. People can search in combined national databases (www.eastin.eu). Search is language-transparent: queries can be in natural language and multilingual; document hits are re-translated into the query language. Spoken interaction is included as well. Core of the system is a database containing the terminology of the domain, following the ISO9999 classification, and containing 12.000 terms times 7 languages.

 

EXCITEMENT has  two goals. The first is to set up a generic architecture and a comprehensive implementation for a multilingual textual inference platform and to make it available to the scientific and technological communities.The second goal is to develop a new generation of inference-based industrial text exploration applications for customer interactions, which will enable businesses to better analyze and make sense of their diverse and often unpredicted client content. Three customer interaction channels are addressed: speech (transcriptions), email and social media in EN, DE and IT.

 

The PROMISLinguaPilot Project aims at translating, localising and rolling out the existing PROMIS® online service (at present available in English, German and Italian) in six additional languages (Spanish, French, Portuguese, Greek, Romanian and Hungarian), in order to deliver a cost-efficient and easy-to-use Internet based service enabling SMEs to comply with Safety, Health, Environment, Quality and other regulations at European and international level.

 

lise project The key points of the LISE  (Legal Language Interoperability Services)  project are to help terminology managers in public institutions as well as private service providers and companies improve the quality of their terminological resources in legal  and  administrative  domains  and  to  provide  a  web-based  terminology service platform for collaborative inter-institutional work. The characteristics of the web-based terminology service are that it is work-flow oriented and provides input  and  feedback  from  best  practices  in  the  field of  legal  and  administrative terminology management.

 

cesar Human language technologies crucially depend on language resources and tools that are usable, useful and available. The CESAR project (where 9 partners from 6 countries are involved), in close harmony with the META-NET alliance (a Network of Excellence consisting of 54 research centres from 33 countries, which is dedicated to building the technological foundations of a multilingual European information society) intends to address this issue by enhancing, upgrading, standardizing and cross-linking a wide variety of language resources and tools and making them available, thus contributing to an open linguistic infrastructure.

 

ACCEPT The use of machine translation (MT) is becoming much more pervasive. At the same time, Web 2.0 paradigms are democratising content creation - stressing the value of communities of users creating content for each other. However, right now these two trends are fairly incompatible. MT engines, even statistical engines, cannot produce acceptable results for community content due to the extreme variability within the content. The ACCEPT project will address this issue by developing new technologies designed specifically to help MT work better in this environment. The approach consists of three main avenues of research and development: 1) new paradigms for “minimally intrusive” preediting content; 2) the development of strategies for post-editing content which, rather than fully relying on trained translators, will also leverage the monolingual skills of volunteer domain experts; 3) the use of the insights gained in the editing process and using innovative text analytics to improve the statistical MT engines themselves.

 

presemt The PRESEMT project constitutes a novel approach to Machine Translation, characterised by the use of (a) cross-disciplinary techniques, mainly borrowed from the machine learning and computational intelligence domains, and (b) relatively inexpensive language resources. The aim is to develop a language-independent methodology for the creation of a flexible and adaptable MT system, the features of which ensure easy portability to new language pairs or adaptability to particular user requirements. PRESEMT falls within the Corpus-based MT (CBMT) paradigm. The resources employed, a small bilingual corpus and a large target language (TL) monolingual one, are collected as far as possible over the web, to simplify the development of resources for new language pairs.

 

SSPNet The exploration of how we as human beings react to the world and interact with it and each other remains one of the greatest scientic challenges. Perceiving, learning, and adapting to the world around us are commonly labelled as intelligent behaviour. But what does it mean being intelligent? Is IQ a good measure of human intelligence and the best predictor of somebody's success in life? There is now a growing research in cognitive sciences, which argues that our common view of intelligence is too narrow, ignoring a crucial range of abilities that matter immensely for how we do in life. This range of abilities is called social intelligence and includes the ability to express and recognise social signals like agreement, politeness, and empathy, coupled with the ability to manage them in order to get along well with others while winning their cooperation. For this reason, the goal of the Social Signal Processing Network (SSPNet) is to establish a virtual distributed institute for research on Social Signal Processing (SSP), i.e. on conceptual modelling, detection, interpretation, and synthesis of social signals.

 

flavius The FLAVIUSproject aims to provide website publishers and webmasters with a comprehensive and user-friendly tool allowing them to generate multilingual versions of their website(s) by combining spell-checking, customized machine translation, translation memory and post-edition capacities. The FLAVIUS platform is currently in beta stage, but already fully functional. Give it a try at flavius.reverso.net.

 

trendminer The goal of TrendMineris to deliver innovative, portable open-source real-time methods for cross-lingual mining and summarisation of large-scale stream media. TrendMiner will achieve this through an inter-disciplinary approach, combining deep linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. No expensive human annotated data will be required due to our use of time-series data (e.g. financial markets, political polls) as a proxy. A key novelty will be weakly supervised machine learning algorithms for automatic discovery of new trends and correlations. Scalability and affordability will be addressed through a cloud-based infrastructure for real-time text mining from stream media.

 

sumat

SUMAT is developing an online subtitle translation service addressing nine EU languages combined into 14 pairs, with the aim of semi-automatizing the current translation processes of the subtitling industry in order to optimize efficiency and productivity and help the industry meet the increasingly demanding market needs. SMT systems will be trained on large amounts of professionally produced parallel and monolingual subtitle data provided by the subtitling companies of the consortium. In order to build optimal SMT systems for subtitling, linguistic information will be exploited to augment language models, enhance translation models, deal with unknown words and make use of syntactic dependencies. The best systems per language pair will then be integrated in the online service and their suitability will be tested for subtitle translation followed by post-editing.

 

Faust The FAUST project will develop machine translation (MT) systems which respond rapidly and intelligently to user feedback.  Experiment with our translation systems at http://labs.reverso.net.

 

Get Save GetHomeSafe is a European Commision Collaborative Project – STREP aiming to develop a system for safe information access (search, navigation, points of interest) and communication (texting) while driving as well as basic functionalities such as music management, address book, phone management search facilities and even social media update. The project will develop solutions involving multimodal dialogue, where speech, a key modality, will be handled by a hybrid (local and remote) speech recognition module.

 

 TTC The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora)  has contributed to leveraging: computer-assisted translation (CAT) tools; machine translation (MT) systems; and multilingual content (corpora and terminology) management tools by generating bilingual terminologies automatically from comparable corpora in five EU languages belonging to three language families: Germanic (English and German), Romance (French and Spanish), and Baltic (Latvian) as well as outside the European Union: Slavonic (Russian) and Sino-Tibetan (Chinese).

 

 annomarket The commercially-oriented Annomarketproject aims to revolutionise the text annotation market, by delivering an affordable, open marketplace for pay-as-you-go, cloud-based extraction resources and services, in multiple languages. The main beneficiaries will be the SME providers of text analysis resources and services, who will be able to deploy their custom components/applications and receive revenue via the Annomarket marketplace. Pricing will be transparent (based on data volumes and API calls) and the business model self-sustainable.

 

 Dicta-sign

DICTA-SIGN has researched ways to enable communication between Deaf individuals through the development of human-computer interfaces (HCI) for Deaf users, by means of Sign Language. The project dealt with four Sign Languages: British Sign Language (BSL), German Sign Language (DGS), Greek Sign Language (GSL) and French Sign Language (LSF) and involved research from several scientific domains in order to develop technologies for sign recognition and generation, exploiting linguistic knowledge and resources created and annotated in the framework of the project. To serve its goals, DICTA-SIGN combined linguistic knowledge with computer vision for image and video analysis that serves to achieve continuous sign recognition as presented in sign language videos, and with computer graphics for realistic signing animation by means of a virtual signer (avatar). The project’s outcomes are best represented by the Sign-Wiki project demonstrator.

 Port Dial

PortDial (Language Resources for Portable Multilingual Spoken Dialogue Systems). A major roadblock in spoken dialogue system (SDS) design is the lack of linguistic resources that would enable the rapid porting of SDS to new domains and languages. PortDial aims to: 1) devise machine-aided methods for creating, cleaning up and publishing multilingual domain ontologies and grammars for SDS prototyping, 2) create a commercial platform for quick prototyping of SDS to new domains and languages, and 3) put together a data exchange marketplace for speech service developers. The main technical innovation behind the PortDial project is combining knowledge-based and data-driven approaches for ontology and grammar induction from web-harvested data. The PortDial platform will enable rapid and cost-effective porting of speech services to new domains and languages, serving in particular SMEs in the mobile application development industry, but also the research community.

Mormed

The main goal of the MORMED project is to establish and offer a multilingual thematic community platform for rare diseases as a service to groups of interested users. This service will be offered by LTC, as the main service provider and translation services facilitator, even after the end of the project's funding period.