Resource Creation for English-Maithili Machine Translation (EMMT) A Divergence Perspective

dc.contributor.authorNidhi, R.
dc.contributor.authorSingh, T.
dc.date.accessioned2019-07-11T04:47:45Z
dc.date.available2019-07-11T04:47:45Z
dc.date.issued2018
dc.description.abstractMaithili is one of the 22 scheduled Indian languages with almost no language technology resource. Absence of basic tools in this language has affected resource creation. Since English is the dominant language, translation from it can help creating the required corpora for tools development in Maithili. The present work discusses efforts for Language Technology Resource (LTR) creation and divergence study for an EMMT system, which is a Statistical Machine Translation (SMT) system. Creating any SMT system requires sizeable parallel, aligned corpora for training and testing. Creating general-purpose source corpora for English language and creating translation equivalents with possible alignments requires time and effort. The paper focuses on the data collection methods, cleaning, the size and structure of the text corpora, alignment and parallelization strategies, training, testing and a study of divergence between the language pair.en_US
dc.identifier.isbn9789550481194
dc.identifier.urihttp://erepo.lib.uwu.ac.lk/bitstream/handle/123456789/1433/108-2018-Resource%20Creation%20for%20English-Maithili%20Machine%20Translation%20%28EMMT%29.pdf?sequence=1&isAllowed=y
dc.language.isoenen_US
dc.publisherUva Wellassa University of Sri Lankaen_US
dc.subjectComputer Scienceen_US
dc.subjectInformation Scienceen_US
dc.subjectComputing and Information Scienceen_US
dc.titleResource Creation for English-Maithili Machine Translation (EMMT) A Divergence Perspectiveen_US
dc.title.alternativeInternational Research Conference 2018en_US
dc.typeOtheren_US
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
108-2018-Resource Creation for English-Maithili Machine Translation (EMMT).pdf
Size:
107.9 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: