Different Behaviors when showing an ASCII and Unicode character

chansey97 · March 10, 2024, 8:19pm

The following program

main :: IO ()
main = do
  putStrLn $ show '\65'
  putStrLn $ show 'A'
  putStrLn $ show '\8704'
  putStrLn $ show '∀'

Output

'A'
'A'
'\8704'
'\8704'

Both ‘A’ and ‘∀’ are treated as Unicode characters, why ‘A’ shows ‘A’ but ‘∀’ shows ‘\8704’ ?

Is this behavior expected? I have not seen any reference mention it.

You can test it with Haskell Playground

Thanks.

jaror · March 10, 2024, 8:26pm

This is a known issue:

#20027: Support Unicode characters in instance Show String · Issues · Glasgow Haskell Compiler / GHC · GitLab
[Solved]Question about how `show` deal with non-English character
Proposal: `showLitChar` (and `show @Char`) shouldn't escape readable Unicode characters · Issue #26 · haskell/core-libraries-committee · GitHub

janus · March 10, 2024, 8:28pm

The Show class should be treated as a debugging aid. It’s a lawless type class and as such, it’s better not to rely on it working a certain way.

asjo · March 10, 2024, 10:01pm

On the other hand, this program:

main :: IO ()
main = do
  putChar '\65'
  putChar 'A'
  putChar '\8704'
  putChar '∀'

gives this output:

$ runhaskell hep.hs 
AA∀∀$

tomjaguarpaw · March 11, 2024, 10:26am

Sure, and the same for putStrLn:

main :: IO ()
main = do
  putStrLn "\65"
  putStrLn "A"
  putStrLn "\8704"
  putStrLn "∀"

A
A
∀
∀

tdammers · March 11, 2024, 10:28am

OK, so I think there are two sources of confusion here.

The first one is "why does show print both '\65' and 'A' as 'A'? (And same for '\8704' vs. '∀'?)

This is because '\65' and 'A' represent the exact same value, and show has no idea which syntax you used to define it. In fact, once your code has gone through the first stages of compilation, the difference is gone - both literals will compile to the exact same code, and even if show wanted to, it couldn’t possibly tell the difference.

The second one is “why is ‘A’ printed as the actual character, but ‘∀’ is printed as an escape?”

This has to do with what show is supposed to do: it gives you something similar to how a programmer would write the value being shown in code.

And because non-ASCII Unicode in source files can be problematic, a lot of programmers adopt the convention that ASCII characters can be written as-is, but non-ASCII characters in string and character literals should be spelled out as escapes. This is useful for several reasons:

Those non-ASCII characters might not even exist on someone’s keyboard, so working with code that uses them can be very awkward - try grepping for “γ” on a computer that doesn’t have Greek letters on its keyboard.
The client software used to view the code might not render all non-ASCII characters properly; ASCII characters, however, will work reliably across any client that uses any superset of ASCII, including UTF-8, and most 8-bit codepages.
ASCII also tends to survive much more mangling, e.g. when transmitting code through legacy communication systems (old email systems, IRC, issue trackers running on a non-UTF8 database, etc.)
ASCII characters will be rendered unambiguously in most clients and with most fonts, especially those used for programming; but within Unicode, there are many ways of spelling a string that will look identical, and this can actually be positively devious - try this, for example:

main = do
    print 'A'
    print 'Α'

So, in short, show will follow the simple rule that if it’s a printable ASCII character, it will print it as is, otherwise it will print a suitable escape sequence, just like a programmer following the above convention would.

treblacy · March 11, 2024, 9:07pm

show for Char and String try to stick to ASCII so as to make the weakest assumption about terminals.

You would’t believe it, but in this the 21st Century the Year 2024 of Our Lord, Windows terminal still does not default to any UTF-* code page.

How would you feel about dropping Windows support altogether? >:)

mpilgrem · March 12, 2024, 12:39am

If ‘Windows’ is part of the reasoning, then the reasoning may need to be revisted, given the capabilities of Windows Terminal - available for Windows 10 and the default terminal on Windows 11.

treblacy · March 15, 2024, 5:47pm

On the “bright” side, this kind of changes to the standard library can take forever on the Libraries Committee!

(Also takes forever until someone is interested enough to write a proposal in the first place!)

hasufell · March 15, 2024, 5:58pm

Given a proper proposal, the CLC process is fairly swift and straightforward.

Topic		Replies	Views
[Solved]Question about how `show` deal with non-English character Learn	3	798	November 15, 2023
[Solved]What the range of characters that `putStrLn` support? Learn	5	930	December 10, 2023
Confusion on `show` function in GHCI prompt Learn	4	178	February 10, 2025
[ANN] text-display 0.0.1.0: A typeclass for user-facing output Announcements	6	1099	January 3, 2022
Why `show` function can't make a character into a string itself? Learn	2	159	February 4, 2025

Different Behaviors when showing an ASCII and Unicode character

Related topics