Error on seemingly fine alex token

I’ve got a simple Alex lexer file, and for some reason, I’m getting a “parse error” on 23:88 with no explanation whatsoever.

For context, here’s the full file: Lexer.x · GitHub

$hexdigit = [a-fA-F$digit]
@string_char = \\[nrt\\]|\\u[$hexdigit]{4}|[^\\]

tokens :-
  -- ...
  "\"" @string_char* "\"" { String $ pack . read $ unpack . replace "\\u" "\\" . pack }

I’ve tried figuring this out for a whole day now, but I can’t seem to find anything, and it doesn’t help that I have nothing to go off of from the error message.

Through the process of elimination, I’ve found that the problem is with the two string literals (try deleting "\\u" and "\\" and then Alex runs fine), though I don’t know what’s wrong with it…

Also, I think the function should be String . pack . read . unpack . (replace "\\u" "\\") . pack. Your current one doesn’t pass the typecheck in GHCi, though I don’t think that’s the problem.

ghci> :set -XOverloadedStrings
ghci> import Data.Text
ghci> data Token = String Text
ghci> :t (String . pack . read . unpack . (replace "\\u" "\\") . pack)
(String . pack . read . unpack . (replace "\\u" "\\") . pack)
  :: String -> Token
1 Like

Well I guess what I would do is just write it as a separate function:

@string_char = \\[nrt\\]|\\u[$hexdigit]{4}|[^\\]

tokens :-
  -- ...
  "\"" @string_char* "\"" { tokString }

{
...
tokString = String . pack . read . unpack . (replace "\\u" "\\") . pack
}

Slight sidenote, you should probably import Data.Text qualified because it’s getting name collisions with Prelude functions that the Alex-generated code is using.

1 Like

This worked, thank you so much!

1 Like

The action on this lexer rule seems wrong. Instead of replacing \u with , I think you wanted \x. This will still potentially be wrong if more digits follow the 4 hex digits matched in the regex.

To be safe you’ll need to replace \uABCD with \xABCD\& if you’re going to stick with using read