Machine Translation Engine Training Project
In today’s translation industry, machine translation is no longer new and how to customize machine translation engines to meet the demands of varied translation domains is focus of technology development in the industry.
For my machine translation engine training project, my teammates and I choose subtitle translation as our domain as there is a big market for auto-translated subtitle service for major social media and video platforms like YouTube or TikTok. We believe that a properly trained subtitle machine translation engine can immensely improve viewers’ experience watching videos not produced in their native languages. As subtitle translation is still a too broad topic for us, we choose Chinese cooking videos on YouTube as our pilot project focus and tried training a machine translation engine that can generate English subtitles for international viewers.
In the project proposal below, we present the data structure, human evaluation methods, training timeline, and budget for this pilot project. The whole pilot project lasted for one month in which we tried two machine translation engine training platforms (Microsoft Custom Translator and SYSTRAN), modified our training strategies by adding additional training datasets and dictionaries to improve the BLEU score. The post-mortem of the pilot project can be found by the end of the updated project proposal below.
Translation Management System Studies
In April 2022, I was very lucky to attend the GALA 2022 Conference held in San Diego, California. At the conference, I learnt about some of the latest technologies in the translation and localization industries and how they can be applied to the translation management system (TMS) to better serve translation buyers with different needs.
During the conference, I found that there is a major debate about how future TMS should be like in the industry in which some believe that TMS should include as many features as possible and eventually become a single tool that support all translation technologies like CAT tools and machine translation engines, while others hold the view that TMS should stay simple instead of becoming an integration of “everything”. Personally speaking, I think TMS should remain a management tool but can be connected to other tools to become customized and therefore, better serve translation buyers in different domains. For instance, for technical content translation, the TMS can be integrated with neural machine translation engine and even natural language processing tools to reduce the work of human review, and for marketing material translation, the TMS can be linked to marketing automation tools to keep the content transfer easy and consistent.
As a summary of what I have learned at the GALA conference, my friend Sam Jamieson, who attended the conference with me, and I made the following video to discuss about what TMS should look like in the near future.
