Language Weaver's translation tools move from spy vs. spy to Web content

October 9, 2008: Ian Lamont, Managing Editor for The Industry Standard

Automated, effective machine translation has been a holy grail of computer scientists for decades. It's not just a practical challenge -- it's a technology that has the potential to revolutionize communication and lead to new classes of software applications.

The arrival of a global communications medium -- the Internet -- has increased the need for high-quality translation software. The Industry Standard spoke with Mark Tapling, the president and CEO of Language Weaver, about his company's approach to machine translation. The company's products are centered around a statistical-based translation engine that has been used for years by U.S. intelligence agencies. It supports 50 language pairs. Language Weaver has also launched a high-volume Internet-based application that can be optimized for internal company documents or websites.

One of the company's major selling points is cost: Tapling says human translation averages 21 cents per word, which is too expensive for most types of online content. In the interview (see below) he stated that Language Weaver is a far cheaper alternative.

The company has already attracted several customers who want to produce international versions of their websites. One of the largest is Trip Advisor, which uses the software to translate user-generated reviews of hotels and other places into various European languages and Japanese. However, price is not the only consideration for using Language Weaver -- output quality matters greatly to readers, and the software needs to be "trained" to improve the quality of translated Web content.

Our interview with Tapling follows:

The Industry Standard: How does Language Weaver compare with human translators and other machine translation services in terms of cost?

Mark Tapling: Language Weaver translations are at least 2,000 times more cost effective than the cost of human translators. Our products are priced by application versus tools. In this case, we are looking to establish leadership as both the quality and price performer. Pricing for alternate solutions have a wide variety of entry points, but we know when considering volume, speed, and accuracy that our products are viewed as very compelling on the pricing front.

TIS: What is the breakdown of clients who use the hosted service, compared to the standalone CD-based product?

Tapling: We just launched our SaaS offering 10 days ago. We have two clients up in production, and likely another three coming in the first two weeks, so we are encouraged. The historical business of the company has been geared toward government intelligence applications, and virtually all of those clients are on premise-based systems.

TIS: Which are the top three language pairs among your customers?

Tapling: Arabic, Spanish, are at the top, and then it becomes a race between German and French.

TIS: Who are your competitors?

Tapling: Our biggest competitor is "No Decision, Inc". This is the scenario where clients choose not to pursue translating because the human cost is viewed as too high. If the customer advances past that obstacle, we see Systran, and regional players with limited language coverage.

TIS: Can you describe what "domain training" involves?

Tapling: Our prospective customers deliver to us a combination of mono-lingual destination content, and bi-lingual source/target content. In some cases, we capture this data for them. We then have a series of tools that cleans and aligns data, and runs it against a backbone network containing billions of words. The process updates the statistical algorithms, so that future translation requests understand the lexicon and syntax of a customer's business.

TIS: I observed in several tests of website content (English to simplified Chinese and vice versa, and French to English) that the syntax was often corrupted in the target language. What are the factors that impact translation quality?

Tapling: The key factor is understanding the customer's communication style, and desired outputs. When we can match pairs of input and desired outputs, the translators respond very favorably.

TIS: User-generated content would seem  especially difficult to train and parse, considering the variation in writing styles and unconventional word usage. How does the tool accommodate such content? Is there a drop-off in quality?

Tapling: Excellent point. We are also advancing our translators past phrase base capabilities to syntax based systems with a confidence indicator algorithm that enables us to accurately predict end user response. We also have the capability of filtering out the "keyboard noise" of UGC to achieve high levels of quality.

 



For original article, click here

 

 

Bookmark and Share