Uploading first package to hackage: Endorsements for Infinitree

Dear Haskell Discourse Community,

I am seeking your endorsement to upload a package for the first time.

I invite you to take a look at code and/or documentation of my package Infinitree:

If you are convinced that it is worthy, I’d be very happy to receive your endorsement on hackage:

Thank you very much for your time

4 Likes

I wonder whether you could encode right and left folds with some kind of breadth first search. Additionally, the index of Representable already has an implicit ordering, which suggests that failing any other implementation, Foldable.toList tree = index tree <$> [0..], although this is obviously very inefficient.

I also wonder whether this structure could be useful if you could put cycles into it, which would likely require revealing the constructor; for example, going down the left branch at any stage brings you back to the root, but the right branch incrementally increases or something.

1 Like

This reminds me of an infinite tree of random numbers that Sam Staton et. al. use in LazyPPL. Since they don’t need it, their Tree has no type parameter, but it could easily be made to have one. The difference is that while an Infinitree has two branches at every level, the Tree has countably infinitely many at every level. This raises the cardinality of the tree to the cardinality of Natural -> Natural.

The cool thing is that once you have such a Tree, you can have a state-less random number generator, via an isomorphism split :: Tree -> (Tree,Tree). Can you do that with an Infinitree a? I suppose yes, since Natural has such an isomorphism.

2 Likes

I have tried different approaches but I could not find one which unifies the infinite layout of the tree and allows for easy ordered folds.
Problem being:

  • For the sake of efficiency, I don’t want duplicated node numbers/node subtrees
  • I need to be able to find any number from the root
    But optimally, numbers would be ascending from left to right, that would make for easy folding.

Currently, the tree looks like this.
Infinitree

Every step deeper doubles the step size between the nodes, making sure that all numbers in the layer above are skipped.

I remember trying to use this library: weave: Compositional breadth-first walks, but at that time I couldn’t understand how to do that correctly, maybe I’ll have another go at it.

That’s an interesting thought, I don’t have problems with exposed constructors.
Although cyclic references would kill the name ;]

Thank you for your comment, this is something I absolutely didn’t and couldn’t think of when Implementing Infinitree because I created it specifically for memoization in a Advent of Code problem from last year.

This was very inspiring, I have found it to be possible to imitate the rose tree from LazyPPL with the Infinitree. Didn’t think it could be used in such a way. On monday I have struggled with exactly such a problem (RNG for a simulation) which led me to write another monad, which was just State StdGen in disguise.

This, however, is very much cooler:

module Data.Infinitree.Random (build) where
import System.Random (RandomGen, Random)
import Data.Infinitree (Infinitree ())
import qualified System.Random as Random
import qualified Data.Infinitree as Infinitree

build :: (RandomGen g, Random a) => g -> Infinitree a
build generator = let
  (leaf, generator') = Random.random generator
  (generatorLeft, generatorRight) = Random.split generator'
  in Infinitree.Branch (build generatorLeft) leaf (build generatorRight)

I could now also build a probability monad, but I’m not sure whether it is a good fit for the Infinitree package.

Yeah, memoization of functions and probabilistic programming are quite different things, so you should not worry that your first project is no “one data structure fits all”. Although: The Tree is essentially a memoized version of all potential queries (splits) to a random number generator.

That’s right, I was rather surprised it could be used in such a different way.

After expanding my code from before into a fully-fledged Monad instance

Random Monad based on Infinitree
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE InstanceSigs #-}
module Data.Infinitree.Random (build, Random(..), sample) where
import Data.Infinitree (Infinitree (Branch))
import qualified System.Random (RandomGen, Random)
import qualified System.Random as Random hiding (Random)
import qualified Data.Infinitree as Infinitree

newtype Random random result = Random (Infinitree random -> result)
  deriving stock (Functor)

instance Applicative (Random random) where
  pure :: a -> Random random a
  pure a = Random $ const a
  (<*>) :: Random random (a -> b) -> Random random a -> Random random b
  (<*>) (Random makeF) (Random makeA) = Random $ \ (Branch left _ right) -> let
    f = makeF left
    a = makeA right
    in f a

instance Monad (Random random) where
  (>>=) :: Random random a -> (a -> Random random b) -> Random random b
  (>>=) (Random makeA) f = Random $ \ (Branch left _ right) -> let
    a = makeA left
    (Random makeB) = f a
    in makeB right

sample :: Random result result
sample = Random $ \ (Branch _ x _) -> x

-- | Build an infinitree of random values, this hides the used generator
--
-- Name conflicts force to write this out

build :: (System.Random.RandomGen g, System.Random.Random a) => g -> Infinitree a
build generator = let
  (leaf, generator') = Random.random generator
  (generatorLeft, generatorRight) = Random.split generator'
  in Infinitree.Branch (build generatorLeft) leaf (build generatorRight)

I think that it will be inconvenient unless the program only every needs to generate Random values of a single type.
The typeclass instances I wrote also discard a lot of computation, since the generators in the branches strictly depend on the value put in the leaf. The value must be computed but may never be used.

To me, a state monad over the Generator seems preferable, like this

State MonadRandom
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE TupleSections #-}
module MonadRandom (MonadRandom, choose, run, useRandom, select, range) where
import System.Random (StdGen)
import qualified System.Random as Random
import Data.Bool (bool)
import Data.Vector.Strict (Vector)
import qualified Data.Vector.Strict as Vector

newtype MonadRandom a = MonadRandom (StdGen -> (a, StdGen))
  deriving Functor

instance Applicative MonadRandom where
  pure :: a -> MonadRandom a
  pure x = MonadRandom (x, )
  (<*>) :: MonadRandom (a -> b) -> MonadRandom a -> MonadRandom b
  (<*>) (MonadRandom makeF) (MonadRandom makeX) = MonadRandom $ \ generator -> let
    (f, gen') = makeF generator
    (x, gen'') = makeX gen'
    in (f x, gen'')

instance Monad MonadRandom where
  (>>=) :: MonadRandom a -> (a -> MonadRandom b) -> MonadRandom b
  (>>=) (MonadRandom makeX) f = MonadRandom $ \ generator -> let
    (x, gen') = makeX generator
    (MonadRandom makeB) = f x
    in makeB gen'

useRandom :: (StdGen -> (x, StdGen)) -> MonadRandom x
useRandom f = MonadRandom $ \ generator -> f generator

run :: StdGen -> MonadRandom a -> a
run gen (MonadRandom f) = fst (f gen)

This one also threads the generator strictly through all computations, but it doesn’t have to eagerly evaluate random values that will never be used.

One Advantage the Tree based approach might have is that it is easily parallelizable.

(Edit: remove functions not needed here from the copied MonadRandom implementation)

No you don’t want this. Here is why: Suppose you have a very complex dx :: MonadRandom x and you map it using a trivial function:
dx >>= const (return 4)
Mathematically, one would expect this to be equivalent to return 4 but due to the strictness of the internal state, it isn’t. Although you know 4 is what you get, there will still be potentially many expensive random generator splits before you have it.

In contrast, discarding parts of an infinite tree may be wasteful, but is computationally not expensive. Even better, the reader monad is one of the few known monads that has the desired property dx >>= const (return y) = return y.

Such “pruning” happens a lot in large random models, whence it makes sense to think about performance.

1 Like

Here’s an attempt:

data Stream a = a :< Stream a

toStream :: Infinitree a -> Stream a
toStream (Branch left a right) = a :< intersperseStreams (toStream left) (toStream right)
  where
  intersperseStreams :: Stream a -> Stream a -> Stream a
  intersperseStreams (l :< ls) (r :< rs) = l :< (r :< intersperseStreams ls rs)

streamToList (a :< as) = a : streamToList as

toList = streamToList . toStream

And then you can just define foldr or foldMap using toList. Stream is just because I didn’t want to deal with empty lists, so you could just go straight to list if you’d prefer - potentially you could even omit the list altogether and replace usages of :< with the f in foldr!

1 Like

This looks very much like a breadth-first traversal:
toStream (Branch left a right) = a :< _ touches the root label first, while in intersperseStreams (l :< ls) (r :< rs) = l :< (r :< _) the l and r are the root labels of the child trees. This is a similar strategy as in the enumeration of (Natural,Natural) where one enumerates the antidiagonals of pairs (i,j) where i+j=n for increasing n.

1 Like

This is surprisingly simple, I have added the corresponding instances to the library, thanks a lot for the input.

1 Like

I understand the problem and I think dx >>= const (uniform @Double) would have been a better example. Assuming this is its type:uniform :: Random a => MonadRandom a.
Because it actually forces the generator in linear time depending on the complex function dx, which return 4 would not.
Whereas in the Tree-based Monad: dx >>= const (uniform @Double) would only need logarithmic steps, depending on how deep it is into the call chain. Considering that this is meant to be the top level expression, it runs with only one split.

You seem to be of knowledge in the topic, the current tree-based implementation can only ever deliver random values of a single type.
Would it be feasible/Is it sometimes necessary to instead construct a tree of Generators, which could then be used to deliver values of any type implementing the correct typeclasses?

I’m not sure I understand. The tree of Double values is to be understood as a (uncountable) supply of samples from the uniform distribution on the real unit interval. To obtain random values of type a, all you have to do is to define a mapping from uniform samples in [0,1] (use as many as you like!) to type a. This mapping can be composed via many intermediate types that each draw on a number of samples.
For example, a Bernoulli distribution would use just one sample and map it via (<= 0.5) to the type Bool.

EDIT: The Arbitrary class of QuickCheck follows a similar mechanism, where the prevailing type of samples seems to be 64-bit words instead of real numbers. But the principle is the same: map a sufficient number of random 64-bit words to the desired type a.