GHC ignores lines beginning `#!`

It seems that GHC ignores many lines beginning #!. For example, Main.hs:

#!This is a test
main = do
  putStrLn "OK!"
#!This is a test
  putStrLn "OK!"
#!This is a test

works fine with ghc Main.hs (which is ghc-9.0.2 Main.hs). Is that documented anywhere? I could not identify a reference to it in the Haskell Language Report or the GHC User’s Guide. I assume it is to accommodate ‘shebang lines’ on Unix-like operating systems (but I thought the #! could only be the first two characters in the file).

I say ‘many’ because Main.hs with:

main = do
#!This is a test
  putStrLn "OK!"

results in:

[1 of 1] Compiling Main             ( Main.hs, Main.o )

Main.hs:2:2: error: lexical error in pragma at character '!'
  |
2 | #!This is a test
  |  ^

I do not follow the error message. The GHC User’s Guide says that all pragmas start {#, so I would have expected the complaint to be about the initial #, not the !.

EDIT: I’ve answered my own question below, concluded that it is not documented as well as it could be, and raised a GHC issue #22300.

1 Like

On my Windows edition of GHC and GHCi, 8.10.1, your example mostly works:

#!This is a test
main = do
  putStrLn "OK!"
#!This is a test
  putStrLn "OK!"
#!This is a test    -- parse error on this line

On Unix and Linux systems #! is called Hash-Bang is the standard way of bringing a script into the programs for execution. So I guess that GHC and GHCi at some point started ignoring the sequence.

1 Like

I’m also a Windows user. I think the behaviour is the same on GHC 8.10.1, 8.10.7 and 9.0.2. What causes the difference in the parsing of my first example that you identify is whether or not the source file ends with a final newline. If it does not, GHC complains with:

[1 of 1] Compiling Main             ( Main.hs, Main.o )

Main.hs:6:1: error: parse error on input ‘#!’
  |
6 | #!This is a test
  | ^^

Which is also odd. I understand that the POSIX standard defines a line as ending with a newline character, but I thought GHC was indifferent as to whether source files ended with a newline or not.

On ‘is it documented?’, I found this in GHC’s compiler/GHC/Parser/Lexer.x (an Alex lexical specification):

-- 'bol' state: beginning of a line.  Slurp up all the whitespace (including
-- blank lines) until we find a non-whitespace character, then do layout
-- processing.
--
-- One slight wibble here: what if the line begins with {-#? In
-- theory, we have to lex the pragma to see if it's one we recognise,
-- and if it is, then we backtrack and do_bol, otherwise we treat it
-- as a nested comment.  We don't bother with this: if the line begins
-- with {-#, then we'll assume it's a pragma we know about and go for do_bol.
<bol> {
  \n                                    ;
  ^\# line                              { begin line_prag1 }
  ^\# / { followedByDigit }             { begin line_prag1 }
  ^\# pragma .* \n                      ; -- GCC 3.3 CPP generated, apparently
  ^\# \! .* \n                          ; -- #!, for scripts  -- gcc
  ^\  \# \! .* \n                       ; --  #!, for scripts -- clang; See #6132
  ()                                    { do_bol }
}

So, I suppose a final #!... or <space>#!... that does not end with \n is not classified as a <bol>.