CS159 MT Lab

For this lab, we're going to be examining some output from current statistical machine translation. We will evaluate these systems manually both to get a feeling for where state of the art is right now as well as to see how well our judgements correlate with automatic evaulation measures.

Download the two excel sheets below. Each sheet contains 10 test sentences with 1) the original sentence (in either Hindi or French) 2) a human reference translation and 3) multiple system translations. For each system translation score them on a scale of 1-5 (5 being better) based on:

Fluency: Regardless of the content, how good is the English.
Adequacy: How well is the content preserved? Is all the information correctly conveyed? Ideally, you'd do this based on knowledge of the foreign language, but for this one, you'll likely have to rely on how closely it matches the human reference.
Overall: Give it an overall score. I'll leave this up to you.

In addition, as you read through the sentences, also jot down any observations that you have about the systems (e.g. System X seems to consistently have grammatical issues).

When you're all done, calculate the average fluency, adequacy and overall scores for the different systems on each language pair then go to this Google spreadsheet and enter your results in one of the columns. We'll look at the aggregate results as a class at the end.

Hindi-English
French-English

The data was obtained from Evaluation Matrix