Different Behaviors when showing an ASCII and Unicode character

The following program

main :: IO ()
main = do
  putStrLn $ show '\65'
  putStrLn $ show 'A'
  putStrLn $ show '\8704'
  putStrLn $ show '∀'

Output

'A'
'A'
'\8704'
'\8704'

Both ‘A’ and ‘∀’ are treated as Unicode characters, why ‘A’ shows ‘A’ but ‘∀’ shows ‘\8704’ ?

Is this behavior expected? I have not seen any reference mention it.

You can test it with Haskell Playground

Thanks.

1 Like

This is a known issue:

2 Likes

The Show class should be treated as a debugging aid. It’s a lawless type class and as such, it’s better not to rely on it working a certain way.

4 Likes

On the other hand, this program:

main :: IO ()
main = do
  putChar '\65'
  putChar 'A'
  putChar '\8704'
  putChar '∀'

gives this output:

$ runhaskell hep.hs 
AA∀∀$ 
3 Likes

Sure, and the same for putStrLn:

main :: IO ()
main = do
  putStrLn "\65"
  putStrLn "A"
  putStrLn "\8704"
  putStrLn "∀"
A
A
∀
∀

OK, so I think there are two sources of confusion here.

The first one is "why does show print both '\65' and 'A' as 'A'? (And same for '\8704' vs. '∀'?)

This is because '\65' and 'A' represent the exact same value, and show has no idea which syntax you used to define it. In fact, once your code has gone through the first stages of compilation, the difference is gone - both literals will compile to the exact same code, and even if show wanted to, it couldn’t possibly tell the difference.

The second one is “why is ‘A’ printed as the actual character, but ‘∀’ is printed as an escape?”

This has to do with what show is supposed to do: it gives you something similar to how a programmer would write the value being shown in code.

And because non-ASCII Unicode in source files can be problematic, a lot of programmers adopt the convention that ASCII characters can be written as-is, but non-ASCII characters in string and character literals should be spelled out as escapes. This is useful for several reasons:

  • Those non-ASCII characters might not even exist on someone’s keyboard, so working with code that uses them can be very awkward - try grepping for “γ” on a computer that doesn’t have Greek letters on its keyboard.
  • The client software used to view the code might not render all non-ASCII characters properly; ASCII characters, however, will work reliably across any client that uses any superset of ASCII, including UTF-8, and most 8-bit codepages.
  • ASCII also tends to survive much more mangling, e.g. when transmitting code through legacy communication systems (old email systems, IRC, issue trackers running on a non-UTF8 database, etc.)
  • ASCII characters will be rendered unambiguously in most clients and with most fonts, especially those used for programming; but within Unicode, there are many ways of spelling a string that will look identical, and this can actually be positively devious - try this, for example:
main = do
    print 'A'
    print 'Α'

So, in short, show will follow the simple rule that if it’s a printable ASCII character, it will print it as is, otherwise it will print a suitable escape sequence, just like a programmer following the above convention would.

9 Likes

show for Char and String try to stick to ASCII so as to make the weakest assumption about terminals.

You would’t believe it, but in this the 21st Century the Year 2024 of Our Lord, Windows terminal still does not default to any UTF-* code page.

How would you feel about dropping Windows support altogether? >:)

If ‘Windows’ is part of the reasoning, then the reasoning may need to be revisted, given the capabilities of Windows Terminal - available for Windows 10 and the default terminal on Windows 11.

On the “bright” side, this kind of changes to the standard library can take forever on the Libraries Committee!

(Also takes forever until someone is interested enough to write a proposal in the first place!)

Given a proper proposal, the CLC process is fairly swift and straightforward.

2 Likes