Error on seemingly fine alex token

TheOnlyTails · June 27, 2023, 12:08pm

I’ve got a simple Alex lexer file, and for some reason, I’m getting a “parse error” on 23:88 with no explanation whatsoever.

For context, here’s the full file: Lexer.x · GitHub

$hexdigit = [a-fA-F$digit]
@string_char = \\[nrt\\]|\\u[$hexdigit]{4}|[^\\]

tokens :-
  -- ...
  "\"" @string_char* "\"" { String $ pack . read $ unpack . replace "\\u" "\\" . pack }

I’ve tried figuring this out for a whole day now, but I can’t seem to find anything, and it doesn’t help that I have nothing to go off of from the error message.

GZGavinZhao · June 27, 2023, 12:53pm

Through the process of elimination, I’ve found that the problem is with the two string literals (try deleting "\\u" and "\\" and then Alex runs fine), though I don’t know what’s wrong with it…

Also, I think the function should be String . pack . read . unpack . (replace "\\u" "\\") . pack. Your current one doesn’t pass the typecheck in GHCi, though I don’t think that’s the problem.

ghci> :set -XOverloadedStrings
ghci> import Data.Text
ghci> data Token = String Text
ghci> :t (String . pack . read . unpack . (replace "\\u" "\\") . pack)
(String . pack . read . unpack . (replace "\\u" "\\") . pack)
  :: String -> Token

GZGavinZhao · June 27, 2023, 1:06pm

Well I guess what I would do is just write it as a separate function:

@string_char = \\[nrt\\]|\\u[$hexdigit]{4}|[^\\]

tokens :-
  -- ...
  "\"" @string_char* "\"" { tokString }

{
...
tokString = String . pack . read . unpack . (replace "\\u" "\\") . pack
}

Slight sidenote, you should probably import Data.Text qualified because it’s getting name collisions with Prelude functions that the Alex-generated code is using.

TheOnlyTails · June 27, 2023, 1:39pm

This worked, thank you so much!

glguy · July 2, 2023, 6:04am

The action on this lexer rule seems wrong. Instead of replacing \u with , I think you wanted \x. This will still potentially be wrong if more digits follow the 4 hex digits matched in the regex.

To be safe you’ll need to replace \uABCD with \xABCD\& if you’re going to stick with using read