3910 Keswick Rd., Suite N-2600
Baltimore, MD 21211
Phone:  443-997-9009 / Fax: 443-997-1006

October 9, 2017
CONTACT: Arthur Hirsch
Office: 443-997-9909 / Cell: 443-462-8702
[email protected]  @JHUmediareps

Johns Hopkins Scientists Win Grant for Machine Translation of Rarer Languages

Newswise — A team of computer scientists at the Johns Hopkins University has won a $10.7 million grant from the Office of the Director of National Intelligence to create an information retrieval and translation system for languages that are not widely used.

Philipp Koehn, a computer science professor in Johns Hopkins’ Whiting School of Engineering, is leading a group of 20 professors, research scientists, post-doctoral fellows and doctoral students in an effort to build a system that can respond to inquiries typed in English based on documents written in so-called “low-resource” languages. That means there is relatively little written material in these languages.

“The biggest challenge we’re going to have with this setup is there’s not much data,” said Koehn, who has been researching machine language translation for nearly 20 years and wrote the textbook Statistical Machine Translation. He is affiliated with the Whiting School's Center for Language and Speech Processing.

Koehn said he expected that in a few weeks the DNI would send his group information on a specific language they can use to test the technology they’ve built for the task. He said that ultimately the intelligence agency is likely to choose languages for the project that may be spoken by millions of people, but not rich in written material, languages such as Kurdish, Serbo-Croatian, Khmer, Hmong and Somali. 

The project starts with data. The scientists will compile online samples of the target language that have already been translated into English – about enough text to fill 10 books of 350-pages each – and begin machine analysis of language patterns. That would include sentence structure and the positions of verbs, adjectives and other components.

Using that analysis, rather than the work of a human translator, the scientists develop algorithms that automatically translate the target language.

The system will be designed to respond to queries that include a word and a topic area or “domain,” such as “Zika” in the topic of “government,” or in the topic of “health,” as it’s described on the DNI website. The responses produced by the translation system should tell the user how the material is relevant to the query.

The intelligence agency is launching the effort to explore how such a system might work, as intelligence gathering and analysis has come to encompass ever more languages. For most languages, the agency site says, “there are very few or no automated tools available for information retrieval or machine translation.”

The project is meant to sharply cut the time and the amount of information needed to put a translation system into use for intelligence purposes, the agency says.

At this stage, the program is exploring how these systems can work, and will be set up as a competition among several research institutions: Johns Hopkins, the University of Southern California, Columbia University and technology research and development company Raytheon BBN Technologies.

Koehn said the agency is likely to turn the results of this research over to a private company to build a system that would be used by the government. 

The project starts this month and will run in three phases over the course of four years. Koehn said the government can discontinue the work at the end of the first two phases of 19 months and 17 months. The final stage is scheduled to run 14 months.