Generic placeholder image

Recent Advances in Computer Science and Communications


ISSN (Print): 2666-2558
ISSN (Online): 2666-2566

Research Article

Statistical Analysis of Machine Translation Evaluation Systems for English- Hindi Language Pair

Author(s): Pooja Malik*, Y. Mrudula and Anurag S. Baghel

Volume 13, Issue 5, 2020

Page: [864 - 870] Pages: 7

DOI: 10.2174/2213275912666190716100145

Price: $65


Background: Automatic Machine Translation (AMT) Evaluation Metrics have become popular in the Machine Translation Community in recent times. This is because of the popularity of Machine Translation engines and Machine Translation as a field itself. Translator is a very important tool to break barriers between communities especially in countries like India, where people speak 22 different languages and their many variations. With the onset of Machine Translation engines, there is a need for a system that evaluates how well these are performing. This is where machine translation evaluation enters.

Objective: This paper discusses the importance of Automatic Machine Translation Evaluation and compares various Machine Translation Evaluation metrics by performing Statistical Analysis on various metrics and human evaluations to find out which metric has the highest correlation with human scores.

Methods: The correlation between the Automatic and Human Evaluation Scores and the correlation between the five Automatic evaluation scores are examined at the sentence level. Moreover, a hypothesis is set up and p-values are calculated to find out how significant these correlations are.

Results: The results of the statistical analysis of the scores of various metrics and human scores are shown in the form of graphs to see the trend of the correlation between the scores of Automatic Machine Translation Evaluation metrics and human scores.

Conclusion: Out of the five metrics considered for the study, METEOR shows the highest correlation with human scores as compared to the other metrics.

Keywords: Machine Translation (MT), Machine Translation Evaluation (MTE), Automatic Machine Translation Evaluation Metrics, human evaluation scores, METEOR, statistical analysis.

Graphical Abstract
K. Papineni, S. Roukos, T. Ward, and W-J. Zhu, "BLEU: A method for automatic evaluation of machine translation", In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02). Philadelphia, PA,, 2002pp. 311-318
G. Doddington, "Automatic evaluation of machine translation quality using n-gram co-occurrence statistics", Proceedings of the Second Conference on Human Language Technology (HLT-2002), San Diego, CA, 2002,pp. 128-132
A. Kalyani, H. Kamud, S. Pal Singh, and A. Kumar, "Assessing the quality of MT systems for Hindi to English translation", Int. J. Comput. Appl., vol. 89, no. 15, 2014.
G. Doddington, "Automatic evaluation of machine translation quality using n-gram co-occurrence statistics", In: Proceedings of the second international conference on Human Language Technology Research, 2002pp. 138-145
R. Ananthakrishnan, P. Bhattacharyya, M. Sasikumar, and R.M. Shah, "Some issues in automatic evaluation of english-hindi MT: More blues for BLEU", In: Proceedings of 5th International Conference on Natural Language Processing (ICON-07) Hyderabad, India, 2007.
S. Banerjee, and A. Lavie, "METEOR: An automatic metric for MT evaluation with improved correlation with human judgments", Proceedings of the ACL 2005 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, 2005pp. 65-72
A. Lavie, K. Sagae, and S. Jayaraman, The Significance of Recall in Automatic Metrics for MT Evaluation.In: Conference of the Association for Machine Translation in the Americas., Springer: Berlin, Heidelberg, 2004, pp. 134-143.
P. Malik, and A.S. Baghel, "A summary and comparative study of different metrics for machine translation evaluation, In 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2018, pp. 55-60",
B. Chen, and R. Kuhn, "Amber: A modified BLEU, enhanced ranking metric", In Proceedings of the Sixth Workshop on Statistical Machine Translation, 2011pp. 71-77
M. Pozar, and E. Charniak, Bllip: An Improved Evaluation Metric for Machine Translation. Brown University Master’s Thesis, Providence, Rhode Island, 2006. Available as:.http://www. cs. brown. edu/research/pubs/theses/masters/2006/mpozar. pdf
N. Joshi, I. Mathur, H. Darbari, and A. Kumar, Incorporating Machine Learning Techniques in MT Evaluation.In Advances in Intelligent Systems and Computing., Springer: Cham, 2015, pp. 205-214.
D. Rao, "Machine translation in India: A brief survey, National Centre for Software Technology Gulmohar Road 9, Juhu, Mumbai 400049, India, 21-23 ",
"LTRC, IIIT Hyderabad. Available from:",
"Available from:",
"Available from:",
A. Popescu-Belis, "An experiment in comparative evaluation: humans vs. computers", Proceedings of the Ninth Machine Translation Summit New Orleans, Louisiana, USA , 2003
"Available from:", ion-coefficient-formula-example-significance.html
"Available from:",
J. Turian, L. Shen, and I.D. Melamed, Evaluation of machine translation and its evaluation., New York University: New York, 2006.
H. Darbari, "Computer assisted translation system- An Indian perspective", Machine Translation Summit VII ,1999, pp. 80-85, .

Rights & Permissions Print Export Cite as
© 2022 Bentham Science Publishers | Privacy Policy