Encoding issues in github classroom autograding Haskell assignments

romildo · November 7, 2023, 12:25pm

When preparing an assignment in github classroom for my students, I am facing an issue related to character encoding. We use Portuguese in the input and output texts in the program.

As an example here (see this PR) is a simple assignment with a haskell program to say hello world (Olá, mundo! in Portuguese).

module Main (main) where

main :: IO ()
main = putStrLn "Olá, mundo!"

The tests used for automatic correction looks for the world Olá in the output of the program.

{
  "tests": [
    {
      "name": "Say hello world (with runhaskell)",
      "setup": "",
      "run": "runhaskell hello.hs",
      "input": "",
      "output": "Olá",
      "comparison": "included",
      "timeout": 10,
      "points": 1
    },
    {
      "name": "Say hello world (with ghc)",
      "setup": "ghc hello.hs",
      "run": "./hello",
      "input": "",
      "output": "Olá",
      "comparison": "included",
      "timeout": 10,
      "points": 1
    }
  ]
}

Here is the correction.

The test with runhaskell fails because the text Olá is not found in the output of the program. Clearly there is a diference in encoding:

::error::The output for test Say hello world (with runhaskell) did not match%0AExpected:%0AOlá%0AActual:%0AOl?, mundo!

The test with ghc fails with a runtime error. It seems that the generated code is not able to use the utf-8 encoding:

hello: <stdout>: commitBuffer: invalid argument (cannot encode character '\225')

Any clues on how to deal with this situation is very wellcome.

tomjaguarpaw · November 7, 2023, 12:32pm

Hmm, perhaps you need to set UTF-8 somewhere. Perhaps export LANG=C.UTF-8 somewhere in your workflow script will help?

jackdk · November 7, 2023, 12:48pm

If you can wrap the main provided by the students, would the with-utf8 package help here?

sgraf · November 7, 2023, 1:26pm

As I recently learned, setLocaleEncoding utf8 (which is what with-utf8 seems to use) works as long as you don’t spawn child processes. If your Haskell program wants to spawn child processes, LANG=C.UTF-8 runhaskell hello.hs seems to be the only solution that has worked for me so far. It doesn’t work on windows, though, but that probably isn’t a problem for CI.

romildo · November 7, 2023, 3:20pm

LANG=C.UTF-8 runhaskell hello.hs works for me. Thanks.

I have also tried setting the locale with sudo localectl set-locale pt_BR.UTF-8, but it didn’t help.

Topic		Replies	Views
Changing the ghci prompt Learn	6	2159	July 8, 2021
Opening text files fails on M1 darwin (GHC 9.2.1) Learn	15	771	November 16, 2021
Emojis only print as "?"	7	1284	April 6, 2021
[Solved]Question about how `show` deal with non-English character Learn	3	802	November 15, 2023
Writing prettier Haskell with Unicode Syntax and Vim Links	21	2721	October 28, 2024

Encoding issues in github classroom autograding Haskell assignments

Related topics