RE2HS lexer generator

Regular expression compiler re2c now supports Haskell (as of release 4.0).

A short intro from the official website: re2c stands for Regular Expressions to Code. It is a free and open-source lexer generator that supports C, C++, D, Go, Haskell, Java, JavaScript, OCaml, Python, Rust, V, Zig, and can be extended to other languages by implementing a single syntax file. The primary focus of re2c is on generating fast code: it compiles regular expressions to deterministic finite automata and translates them into direct-coded lexers in the target language (such lexers are generally faster and easier to debug than their table-driven analogues). Secondary re2c focus is on flexibility: it does not assume a fixed program template; instead, it allows the user to embed lexers anywhere in the source code and configure them to avoid unnecessary buffering and bounds checks. Internal algorithm used by re2c is based on a special kind of deterministic finite automata: lookahead TDFA. These automata are as fast as ordinary DFA, but they are also capable of performing submatch extraction with minimal overhead.

There is a detailed user guide and online playground with many examples.

7 Likes

Wow, that’s awesome! I love re2c, it’s just so… elegant, and I’m very happy to see a Haskell backend added!

3 Likes