Hey!
Me and my lab partner just finished a partial task in an assignment where this was the requirement:
Given the input
corpus_l = ['D', 'e', ' ', 's', 'e', 'n', 'a', 's', 't', ...]
merge_bigrams(corpus_l, ('e', 'n'))
should return where all the seuquences of ‘e’ and ‘n’ have been merged:
['D', 'e', ' ', 's', 'en', 'a', 's', 't', ...]
And reapplyingmerge_bigrams(corpus_l, ('s', 'en'))
to this corpus should return
['D', 'e', ' ', 'sen', 'a', 's', 't', ...]
You will apply a greedy algorithm. Given the pair (‘a’, ‘a’) and the list [‘a’, ‘a’, ‘a’], the result will be: [‘aa’, ‘a’]
In the spirit of declarative programming, we first tried concatenating the input, and then split on the requested sequence without removing it. However, this made it impossible to compose the function with itself since concatenating the string at the beginning loses the information from the previous call.
Our imperative solution:
def merge_bigrams(corpus_l, pair):
token_r, token_l = pair
token_new = token_r + token_l
new_corpus = []
i = 0
while i < len(corpus_l):
if i + 1 < len(corpus_l) and corpus_l[i] == token_r dand corpus_l[i + 1] == token_l:
new_corpus.append(token_new)
i += 2
else:
new_corpus.append(corpus_l[i])
i += 1
return new_corpus
How would you solve this in Haskell?