GHC ignores lines beginning `#!`

mpilgrem · October 15, 2022, 12:28am

It seems that GHC ignores many lines beginning #!. For example, Main.hs:

#!This is a test
main = do
  putStrLn "OK!"
#!This is a test
  putStrLn "OK!"
#!This is a test

works fine with ghc Main.hs (which is ghc-9.0.2 Main.hs). Is that documented anywhere? I could not identify a reference to it in the Haskell Language Report or the GHC User’s Guide. I assume it is to accommodate ‘shebang lines’ on Unix-like operating systems (but I thought the #! could only be the first two characters in the file).

I say ‘many’ because Main.hs with:

main = do
#!This is a test
  putStrLn "OK!"

results in:

[1 of 1] Compiling Main             ( Main.hs, Main.o )

Main.hs:2:2: error: lexical error in pragma at character '!'
  |
2 | #!This is a test
  |  ^

I do not follow the error message. The GHC User’s Guide says that all pragmas start {#, so I would have expected the complaint to be about the initial #, not the !.

EDIT: I’ve answered my own question below, concluded that it is not documented as well as it could be, and raised a GHC issue #22300.

FrancisKing · October 15, 2022, 7:08am

On my Windows edition of GHC and GHCi, 8.10.1, your example mostly works:

#!This is a test
main = do
  putStrLn "OK!"
#!This is a test
  putStrLn "OK!"
#!This is a test    -- parse error on this line

On Unix and Linux systems #! is called Hash-Bang is the standard way of bringing a script into the programs for execution. So I guess that GHC and GHCi at some point started ignoring the sequence.

mpilgrem · October 15, 2022, 1:40pm

I’m also a Windows user. I think the behaviour is the same on GHC 8.10.1, 8.10.7 and 9.0.2. What causes the difference in the parsing of my first example that you identify is whether or not the source file ends with a final newline. If it does not, GHC complains with:

[1 of 1] Compiling Main             ( Main.hs, Main.o )

Main.hs:6:1: error: parse error on input ‘#!’
  |
6 | #!This is a test
  | ^^

Which is also odd. I understand that the POSIX standard defines a line as ending with a newline character, but I thought GHC was indifferent as to whether source files ended with a newline or not.

mpilgrem · October 15, 2022, 2:08pm

On ‘is it documented?’, I found this in GHC’s compiler/GHC/Parser/Lexer.x (an Alex lexical specification):

-- 'bol' state: beginning of a line.  Slurp up all the whitespace (including
-- blank lines) until we find a non-whitespace character, then do layout
-- processing.
--
-- One slight wibble here: what if the line begins with {-#? In
-- theory, we have to lex the pragma to see if it's one we recognise,
-- and if it is, then we backtrack and do_bol, otherwise we treat it
-- as a nested comment.  We don't bother with this: if the line begins
-- with {-#, then we'll assume it's a pragma we know about and go for do_bol.
<bol> {
  \n                                    ;
  ^\# line                              { begin line_prag1 }
  ^\# / { followedByDigit }             { begin line_prag1 }
  ^\# pragma .* \n                      ; -- GCC 3.3 CPP generated, apparently
  ^\# \! .* \n                          ; -- #!, for scripts  -- gcc
  ^\  \# \! .* \n                       ; --  #!, for scripts -- clang; See #6132
  ()                                    { do_bol }
}

So, I suppose a final #!... or <space>#!... that does not end with \n is not classified as a <bol>.

Topic		Replies	Views
Beginner question: Examples don't work for me :/ Learn	3	561	June 5, 2022
Possibly due to bad layout	0	402	March 24, 2022
Haskell newbie, trying to get Haskell to work in vscode for over a Week Learn	11	1701	October 8, 2022
Strange GHC Behaviour Learn	3	463	March 15, 2020
Why doesn't this work Learn	7	520	August 12, 2020

GHC ignores lines beginning `#!`

Related topics