When preparing an assignment in github classroom for my students, I am facing an issue related to character encoding. We use Portuguese in the input and output texts in the program.
As an example here (see this PR) is a simple assignment with a haskell program to say hello world (Olá, mundo! in Portuguese).
module Main (main) where
main :: IO ()
main = putStrLn "Olá, mundo!"
The tests used for automatic correction looks for the world Olá in the output of the program.
{
"tests": [
{
"name": "Say hello world (with runhaskell)",
"setup": "",
"run": "runhaskell hello.hs",
"input": "",
"output": "Olá",
"comparison": "included",
"timeout": 10,
"points": 1
},
{
"name": "Say hello world (with ghc)",
"setup": "ghc hello.hs",
"run": "./hello",
"input": "",
"output": "Olá",
"comparison": "included",
"timeout": 10,
"points": 1
}
]
}
Here is the correction.
The test with runhaskell fails because the text Olá is not found in the output of the program. Clearly there is a diference in encoding:
::error::The output for test Say hello world (with runhaskell) did not match%0AExpected:%0AOlá%0AActual:%0AOl?, mundo!
The test with ghc fails with a runtime error. It seems that the generated code is not able to use the utf-8 encoding:
hello: <stdout>: commitBuffer: invalid argument (cannot encode character '\225')
Any clues on how to deal with this situation is very wellcome.