In your blog post, the DAWG graph shows a loop from p to p in the encoding of “apple”, but I thought cycles were banned. I also don’t understand how a and an are represented in the graph. Is there a primer on this data structure that I could read?
Yeah, I should have get a better example or explain in more details what cycles I was talking about. apple as a word is fine because there’s none-repeated transitions from character to character:
-a,
a-p,
p-p,
p-l,
l-e,
e-
Transition p-p has met only once. So it’s acceptable in such a particular case.
Thank you for pointing out the incomplete picture. I’ve updated it with a and an!
Also, I’ve figured too late that from chosen lexicon it’s hard to tell what is the benefit of DAWG over Trie. I should have pick better lexicon as baseline. I will think about updating blogpost with another example. Thank you!
Hey! Good stuff! Question about LLMs: did you ask it to generate code that you were able to use in the resulting Haskell package? I’d expect big models to do a decent job at translation, even if the languages are pretty distant. But maybe I’m wrong?
Just for fun asked to transform the given ~20-40 loc or small functions from C++ to Haskell and gave some context about it. Result was just horrible.
Asked two separate requests. Read/explain C++ code and write adjusted result as Haskell. It was a bit better than horrible but still require more of my time than I could afford.
When it concerns Haskell, my (limited) experience is that it produced syntactically correct nonsense. Things that were bothering me:
Do-notation abuse, even for pure functions.
Literal translation of C++ code which does not work well for (for, while) loops. Yes, it tries to introduce recursion, well, from the first look it is clear that it should be completely rewritten from scratch.
Making a lot of incorrect assumptions about the end goal (most likely, based on some frequently recognised patterns).
It did not work with my flow (I had just 15-30 minutes per day) at all. Each session consumed a day without going anywhere. After few attempts I got frustrated, gave up and focused on C++ code explanation instead. I was so frustrated that I purged those sessions during one of maintenance days.