Mutable Data structures? Why so hard to find

unhammer · May 27, 2022, 11:17am

Haskell isn’t necessarily slower at tasks involving mutability, but for some tasks (such as word counts!) it’s worth adding a little mutability for the speedup; and mutable hash tables in Haskell could perhaps be better, in particular the hashtables package, though there is now vector-hashtables which seems like it could improve things.

Also, as the streamly example showed, it’s not hard to put mutable values in immutable HashMap. I tried it without streamly now, and it’s about as fast as with streamly as long as I use IORefs in the values instead of pure updates.

Abab9579 · May 28, 2022, 12:03am

Indeed. I said it in wrong way, but what I meant was that many mutable data structures in haskell are slow likely because of lack of love.
(Mutable vector is going to be an exception)

atravers · May 28, 2022, 2:32am

I was answering this question:

…I’ll let you provide Rosetta Code with a better-performing alternative.

…then start a new thread so it [any claims of “denialism” on this topic, along with supporting evidence] can be ~~disputed~~ discussed there.

Abab9579 · May 28, 2022, 2:38am

Well that was the entire point of this thread - mutable structures not getting love.
Please why did you omit the slow (at mutability-involving tasks) in front of that line, I was specifically talking about the cases which is optimized for mutability. Once you omitted that part, it became completely different sentence.

sgraf · May 29, 2022, 5:52pm

Why we don’t have an ST-based API for e.g. Map is indeed a good question. I think we need good transient data structures (those which admit mutable and persistent APIs with seemless switching between them) and our ecosystem is sorely lacking in that regard. Plain and simple.

These mutable data structures would then profit immensely from the mutable constructor fields proposal (on phone. Sorry, Google it), but no one has the time to push it over the finish line.

santiweight · May 29, 2022, 5:58pm

I like your sentiment. I think it’s true that mutability is a second class citizen is most programming, especially programming done in Haskell, but that doesn’t necessarily mean that we should neglect mutability complete. We need to have some better story here for Haskell.

jaror · May 29, 2022, 6:06pm

atravers · May 29, 2022, 9:11pm

This is a proposal to add mutable constructor fields to GHC.

Hmm…

[…]
This reminds me of a funny event at the Haskell workshop 2006. One participant stood up and sincerely proposed that Haskell’ standard would find a way to automatically derive a monadic version of a pure expression. Chung-chieh Shan recommended that person to take a look at OCaml…

– Oleg Kiselyov

Having quickly looked at it…according to the proposal, the following is “inefficient”:

data MutPair a b =
  MutPair {-# UNPACK #-} !(IORef a)
          {-# UNPACK #-} !(IORef b)

So why not just:

data MutPair a b =
  MutPair (IORef# a) (IORef# b)

where:

type IORef# a = MutVar# RealWorld a

jaror · May 29, 2022, 9:57pm

The {-# UNPACK #-} pragma actually converts the data to something with the same representation as your suggestion. The problem is that the MutVar# itself is a pointer to a piece of mutable memory (it’s an UnliftedType not an unboxed type). The proposal wants to make it possible to have mutable memory inside the constructors of data types itself without that extra pointer indirection.

atravers · May 29, 2022, 10:22pm

…as described by State in Haskell:

^{(page 4 of 51)}
[…] A reference can be thought of as the name of (or address of) a variable, an updatable location in the state capable of holding a value. […]
^{(page 5)}
[…] Notice that the reference itself does not change; it is the state which is modified. […]

If these requirements can now be relaxed, then why not just (re)define MutVar# as an unboxed type?

jaror · May 30, 2022, 8:07am

If these requirements can now be relaxed

The proposal is about relaxing these requirements.

why not just (re)define MutVar# as an unboxed type?

I think we’d want to use a different name for backward compatibility and because pointers to mutable data are also still useful in some cases. Other than that, I think that is pretty much what the proposal sets out to do.

unhammer · May 30, 2022, 8:31am

I already did added the non-streamly IORefs version. I still wish I knew what’s making it slower than awk though.

atravers · May 30, 2022, 8:43am

Isn’t that the role of lifted types like IORef and STRef (no trailing hash), in the same way that the lifted Int does for the unboxed Int#?

…through the selective use of reserved words in lieu of individual type annotations:

mutable - as a generic substitute for MutVar# s or MutVar# RealWorld.

I’m just wondering…could this also be used elsewhere in Haskell e.g:

floater - as a generic substitute for Float#, Double#, and presumably Quad# and more at some time in the future;
procedure - as a generic substitute for STRep# s, STRep# RealWorld (i.e. IO#) and presumably STM#.

Nice!

From Controlling Chaos: … by Stephan Herhut, Sven-Bodo and Scholz Clemens Grelck:

Single Assignment C, or SAC for short, is a strict, ﬁrst-order functional language geared at high-performance numerical computation.

Furthermore:

Despite its syntactical proximity to the imperative language C, several carefully chosen restrictions of the core of SAC inhibit any form of side-effect and, thus, render it purely functional.

Based on that, I cautiously suspect that the interaction between two fundamental features in Haskell are primarily responsible:

non-strict semantics;
higher-order functions.

For example, Standard ML has higher-order functions and is strict.

One of the reasons for Haskell’s existence was to make research into, and application of these features easier - to remove them would make Haskell virtually irrelevant. There are only three widely-used non-strict languages with higher-order semantics:

Miranda^(R),
Clean,
Haskell.

That means Haskell is very much “going first” in this field of research, so its implementation/s are going to be lacking in some ways, much in the same way early compilers for C or Pascal were slower than native assembly code - I for one am very glad that they kept on working on those old compilers! Such longevity for languages like Haskell (and their features) is what we’re interested in, because like assembly was all those decades ago, we want something better than what we have now.

jaror · May 30, 2022, 9:02am

It seems like you’re right that MutVar# will be unnecessary after this proposal. The proposal has a section about removing it. And there’s also some discussion about using a proper Mutable type instead of a mutable keyword.

MasseR · May 30, 2022, 11:21am

Just want to add one datapoint to this discussion. Using the fiwiki dump, it takes 8 seconds for awk to finish, 20 seconds for ordered-containers and 19 seconds for unordered containers. This with absolutely no effort spent on optimizing.

{-# LANGUAGE OverloadedStrings #-}
module Main where

import qualified Data.Map.Strict as M
import qualified Data.Text.Lazy as Lazy
import qualified Data.Text.Lazy.IO as LazyIO
import qualified Data.Text as Strict
import Data.Map (Map)
import Data.Foldable (foldl', for_)
import qualified Data.Text.IO as StrictIO
import Data.List (sortOn)
import Data.Ord (Down(Down))
import Data.HashMap.Strict (HashMap)
import qualified Data.HashMap.Strict as HM


orderedHisto :: Lazy.Text -> [(Strict.Text, Int)]
orderedHisto = M.toList . histo
  where
    histo :: Lazy.Text -> Map Strict.Text Int
    histo = foldl' (\acc x -> M.insertWith (+) (Lazy.toStrict x) 1 acc) M.empty . Lazy.lines

unorderedHisto :: Lazy.Text -> [(Strict.Text, Int)]
unorderedHisto = HM.toList . histo
  where
    histo :: Lazy.Text -> HashMap Strict.Text Int
    histo = foldl' (\acc x -> HM.insertWith (+) (Lazy.toStrict x) 1 acc) HM.empty . Lazy.lines

main :: IO ()
main = do
  contents <- LazyIO.getContents
  for_ (take 10 $ sortOn (Down . snd) $ unorderedHisto contents) $ \(key, val) ->
    StrictIO.putStrLn $ key <> " " <> Strict.pack (show val)

$ bzcat ~/temp/realytemp/fiwiki-20220520-pages-articles-multistream.xml.bz2 | head -1000000| tr ' ' '\n' | time ./test > /dev/null
./test > /dev/null  17.39s user 1.18s system 93% cpu 19.908 total
$ bzcat ~/temp/realytemp/fiwiki-20220520-pages-articles-multistream.xml.bz2 | head -1000000| tr ' ' '\n' | time awk '{s[$0]++} END{ PROCINFO["sorted_in"]="@val_num_desc"; i=1;for(x in s){if (i++>10)break;print x,s[x]}}' > /dev/null
awk  > /dev/null  7.34s user 0.25s system 87% cpu 8.629 total

atravers · May 30, 2022, 12:43pm

Section 1.2.1 of @simonmar’s proposal:

module Data.Mutable.IO where
  type Mutable a = Ref# RealWorld a
  readRef :: Mutable a -> IO a
  writeRef :: Mutable a -> a -> IO ()

module Data.Mutable.ST where
  type Mutable s a = Ref# s a
  readRef :: Mutable s a -> ST s a
  writeRef :: Mutable s a -> a -> ST s ()

…has me wondering about Ref#'s state parameter s - maybe it should be for the monad providing the context, which presumably would have a MonadPrim instance:

data Ref# m a  -- unless it needs to be a primitive type
newRef# :: MonadPrim m => a -> m (Ref# m a)
readRef# :: MonadPrim m => Ref# m a -> m a
writeRef# :: MonadPrim m => Ref# m a -> a -> m ()

which then allows e.g:

type STRef# s a = Ref# (ST s) a
type IORef# a   = Ref# IO a
type TVar# a    = Ref# STM a

As for the original example…

data MutPair ... where
  MutPair :: Ref m1 a -> Ref m2 b -> MutPair (m1 a) (m2 b)

(…perhaps - completing that declaration is left as an exercise :-)

jaror · May 30, 2022, 12:56pm

I think the spirit of PrimMonad would be to use the PrimState type family, like so:

data Ref# s a  -- unless it needs to be a primitive type
newRef# :: PrimMonad m => a -> m (Ref# (PrimState m) a)
readRef# :: PrimMonad m => Ref# (PrimState m) a -> m a
writeRef# :: PrimMonad m => Ref# (PrimState m) a -> a -> m ()

type STRef# s a = Ref# s a
type IORef# a   = Ref# RealWorld a

I don’t think STM can be made to fit this Ref# type. STM is not an instance of PrimMonad either.

As for:

data MutPair ... where
  MutPair :: Ref m1 a -> Ref m2 b -> MutPair (m1 a) (m2 b)

Constructing the MutPair must be an effectful action inside some monad, so it must be of the form:

data MutPair ... where
  MutPair :: Ref m1 a -> Ref m2 b -> m' (MutPair (m1 a) (m2 b))

But what would m' be there?

atravers · May 30, 2022, 1:33pm

…I intended the use of the type class to restrict the use of Ref# variables to “appropriate” monadic types, so an alternative would be:

class Monad m => MonadCaptive m where ...

instance MonadCaptive (ST s) where ...
instance MonadCaptive IO where ...
instance MonadCaptive STM where ...

data Ref# m a  -- unless it needs to be a primitive type
newRef# :: MonadCaptive m => a -> m (Ref# m a)
readRef# :: MonadCaptive m => Ref# m a -> m a
writeRef# :: MonadCaptive m => Ref# m a -> a -> m ()

…how to prevent instances for e.g. Identity or Maybe from being defined is the obvious question there!

jaror:

Constructing the MutPair must be an effectful action inside some monad, so it must be of the form:
data MutPair ... where
  MutPair :: Ref m1 a -> Ref m2 b -> m' (MutPair (m1 a) (m2 b))
But what would m' be there?

(…hence the embedding of m1 and m2 within the two arguments of MutPair.)

As for m', if m1 == m2 then the declaration can be simpler:

data MutPair ... where
  MutPair :: Ref m' a -> Ref m' b -> m' (MutPair a b)

For m1 /= m2 , some imaginative thinking could be required…

hasufell · May 30, 2022, 1:46pm

Did you know about Announcing the "stm-containers" library – Functional programming debugs you

unhammer · May 30, 2022, 2:01pm

How are you running your script? Because afaict it expects one word per line. If I tr ' ' '\n' before inputting into your script, my computer runs out of memory

Topic		Replies	Views
Mutability / side effects Learn	6	1449	August 9, 2022
Search Index in 150 Lines of Haskell Links	1	219	April 9, 2025
[ANN] vector-hashtables-0.1.0.1 Announcements	4	535	September 10, 2021
[Design] Dataframes in Haskell Haskell Foundation	25	2902	January 6, 2025
Maintaining haskell programs	104	7954	December 6, 2023

Mutable Data structures? Why so hard to find

Related topics