[ANN] dawgdic: yet another library for DAWG dictionaries

Instead of making FFI, I have ported a C++ library to Haskell entirely and it turned out it (partially) outperforms existing DAWG packages.

Feedback is appreciated :slight_smile:

10 Likes

In your blog post, the DAWG graph shows a loop from p to p in the encoding of “apple”, but I thought cycles were banned. I also don’t understand how a and an are represented in the graph. Is there a primer on this data structure that I could read?

1 Like

Yeah, I should have get a better example or explain in more details what cycles I was talking about. apple as a word is fine because there’s none-repeated transitions from character to character:

  • -a,
  • a-p,
  • p-p,
  • p-l,
  • l-e,
  • e-

Transition p-p has met only once. So it’s acceptable in such a particular case.

Thank you for pointing out the incomplete picture. I’ve updated it with a and an!

As of primer, you can start with Deterministic acyclic finite state automaton - Wikipedia . I will update my post with more links or maybe expand explanation on the data structure.

Also, I’ve figured too late that from chosen lexicon it’s hard to tell what is the benefit of DAWG over Trie. I should have pick better lexicon as baseline. I will think about updating blogpost with another example. Thank you!

4 Likes

Hey! Good stuff! Question about LLMs: did you ask it to generate code that you were able to use in the resulting Haskell package? I’d expect big models to do a decent job at translation, even if the languages are pretty distant. But maybe I’m wrong?

1 Like

Thank you! I have tried two things:

  • Just for fun asked to transform the given ~20-40 loc or small functions from C++ to Haskell and gave some context about it. Result was just horrible.
  • Asked two separate requests. Read/explain C++ code and write adjusted result as Haskell. It was a bit better than horrible but still require more of my time than I could afford.

When it concerns Haskell, my (limited) experience is that it produced syntactically correct nonsense. Things that were bothering me:

  • Do-notation abuse, even for pure functions.
  • Literal translation of C++ code which does not work well for (for, while) loops. Yes, it tries to introduce recursion, well, from the first look it is clear that it should be completely rewritten from scratch.
  • Making a lot of incorrect assumptions about the end goal (most likely, based on some frequently recognised patterns).

It did not work with my flow (I had just 15-30 minutes per day) at all. Each session consumed a day without going anywhere. After few attempts I got frustrated, gave up and focused on C++ code explanation instead. I was so frustrated that I purged those sessions during one of maintenance days.

2 Likes