State monad - memory exhausted

atravers · January 27, 2024, 4:40pm

I thought we were discussing matters pertaining to (ordinary) strictness, not hyper-strictness…

jaror · January 27, 2024, 4:53pm

Ah, then I think this whole discussion has been one big misunderstanding. The answer to @sgraf’s question:

Is simply that you aren’t allowed to put the annotions nested in a data type like that.

You could only use it on function arguments like this:

tuple :: !Int -> !Int -> (Int, Int)
tuple x y = (x, y)

And perhaps to variables like:

x :: !Int
x = 1 + 2 + 3

But that is difficult if that is a top-level binding (I have worked on that problem during my internship with Well-Typed).

atravers · January 27, 2024, 4:57pm

Lazier-ness in action:

ghci> map (const '\a') [let e = error e in e, let s = '\a':s in s]
"\a\a"
ghci>

So map is indifferent to its arguments being fully evaluated or not (or partially so). That would be modulated/adjusted/“fine-tuned” by using strictness annotations (however they may be implemented) as needed to obtain the result desired by the programmer.

atravers · January 27, 2024, 4:59pm

So wouldn’t that also be a problem for -XStrict?

tomjaguarpaw · January 27, 2024, 5:06pm

Regarding this idea specifically, I have an old article on a related issue.

atravers · January 27, 2024, 5:12pm

State monad - memory exhausted

Why did this happen?
- Because GHC uses a GC-collected heap, and GC can’t reclaim everything when needed.
The simple solution?
- Dump the GC, and maybe even the heap as well:
  
  On Inter-deriving Small-step and Big-step Semantics: A Case Study for Storeless Call-by-need Evaluation (2011).
  
  …all going well: no need for strictness annotations, Unlifted, StrictT and the rest!

BurningWitness · January 27, 2024, 5:38pm

Pardon my naivety, but the paper is from 2011 and the non-moving GC was merged in 2020, surely those 30 pages of lambda calculus don’t just solve one of the cornerstone problems of compiler design and noone noticed it? I unfortunately don’t have a decade of necessary background to address the paper directly (or to even be able to read it properly).

sgraf · January 27, 2024, 6:19pm

I commented on that paper here What about having StrictData and non-strict semantics on functions as the default programming style? - #13 by sgraf. TLDR; I don’t think it is of implementational relevance.

jaror · January 27, 2024, 6:21pm

Yes, -XStrict does nothing with top-level bindings. E.g. this compiles and doesn’t cause infinite loops:

{-# LANGUAGE Strict #-}

fibs :: [Integer]
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

main = print (fibs !! 10)

Ah, it’s also in the documentation:

Top level bindings

are unaffected by Strict. […] Reason: there is no good moment to force them, until first use.

atravers · January 27, 2024, 9:56pm

Prior to the appearance of these these 35 pages the choice to use the push/enter or eval/apply styles was largely arbitrary. But in GHC, the eval/apply implementation did avoid a irritating source of complexity:

(page 3)

push/enter requires a stack like no other: stack-walking is more difficult, and compiling to an intermediate language like C or C-- is awkward or impossible.

and primarily for this reason, the push/enter implementation was abandoned. Now just imagine what could also be abandoned if GHC (like proto-Rust eventually did) no longer had GC…

Alright, back to this example of yours:

…if this was difficult, can I at least assume that something like

x :: Int#
x = ...

or:

x :: Unlifted @!%$&# ...
x = ...

would be similarly as awkward?

jaror · January 27, 2024, 10:17pm

Yes, the unlifted part is what I have worked on (!10841). The main issue is that unlifted variables must never be thunks, so they need to be fully evaluated before the program starts. And GHC has no way of evaluating code at compile-time in a reliable way (constant propagation and similar optimisations don’t give any guarantees).

So I started out with the idea to allow only constants be bound in this way, so instead of 1 + 2 + 3 you would have to write I# 6# (or more precisely the unlifted equivalent of that). Even number literals are a problem because they are desugared to fromInteger function calls.

At the moment it has stranded a bit on implementing the required semantics in the GHCi bytecode interpreter, which does not have the ability to allocate a Haskell value as static data. It currently just makes every top-level variable a thunk. The native back end did already have that ability as an optimization, so I initially thought it would be pretty easy.

atravers · January 27, 2024, 11:20pm

So why are we here yet again discussing the relative merits of unlifted/unboxed types vs “unlifted classes/families” vs strictness annotations vs etc, etc, etc, etc :

State monad - memory exhausted

There is an infinite number of ways for GC-based Haskell programs to “leak space”.
Haskell implementations like GHC are finite programs.
Therefore it will never be possible for any Haskell implementation to “plug all of those leaks”.

So if you want a “leak-free” Haskell, then you want a Haskell without a garbage-collected heap - all non-trivial attempts to do otherwise will ultimately be futile.

For example, and as successful as it was, optimistic evaluation would still be prone to retaining too much space. If speculative evaluation is interrupted “too early”, then an overly-spacious thunk may not be replaced with its more-compact result.

In the context of I/O models:

Similarly, needing to use advanced evaluation techniques just to circumvent the glaring deficiencies of garbage-collected heaps is also unsatisfactory.

jaror · January 27, 2024, 11:45pm

I don’t believe it is possible to have a practical Haskell without a garbage collected heap. At least not unless you want to fall back on manual memory management like Rust has.

The only evidence you’ve provided that it would be possible is a paper that describes a “storeless” and “heapless” semantics in a theoretical sense. As far as I understand the semantics they describe are transforming the programs themselves. So instead of allocating to a heap it will just make the program larger. And I see no mention of memory reclamation, so that would be equivalent to simply never running the GC in a Haskell program. Then pretty much every program will be one big leak.

Take for example this trace from Section 2:

This is a storeless semantics and as you can see the program pretty much only ever grows. And notice the y that is shadowed on the last two lines which could be reclaimed.

They unfortunately don’t show any traces of the actual storeless abstract machine, but I see no reason why it would behave differently in this respect.

atravers · January 27, 2024, 11:52pm

Now read page 7 of 34…

jaror · January 27, 2024, 11:56pm

Ah they do mention reclamation:

This let expression is needed as long as z occurs free in its
body; thereafter it can be elided with a garbage-collection rule [18].

So you’re proposing to replace a garbage collected heap by… a garbage collected program?

Unfortunately that cited paper doesn’t seem to be available even in the library of my university or the worldwide libraries I can search.

atravers · January 28, 2024, 12:10am

Unfortunately that cited paper doesn’t seem to be available even in my university library.

If CS^x is working again:

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.5026&rep=rep1&type=pdf

…otherwise:

https://archive.org/download/citeseerx-csx_citegraph.2017-03-31/citeseerx_checksums.tsv.gz

…then search for "10.1.1.51.5026" :-/

… a garbage collected program?

You’ll need to be more specific:

a program which incrementally “tidies up” after itself as it runs, “incremental-GC” style;
or a program which never allocates, instead reusing its own heap space:
- Lively Linear Lisp – ‘Look Ma, No Garbage!’ (1991)

jaror · January 28, 2024, 12:20am

I’m asking how you envision an alternative to garbage collection and how that would solve the problems of leaks. But here’s my thoughts on those two options:

I could imagine a Haskell with incremental GC, perhaps using the new reference counting, functional but in place (FBIP), approach. But I fail to see how that would prevent leaks. I believe incremental GC would only really be a solution to GC pauses. It’s not like Haskell programs leak because the GC doesn’t run often enough.

I could also imagine a fully linear Haskell, but then that would basically mean manual memory management like Rust has. I wouldn’t want to manually manage my memory like that in all my programs.

atravers · January 28, 2024, 12:46am

Since the SML source code for the prototype is no longer at the URL listed at the bottom of page 4:

http://www.zerny.dk/def-int-for-call-by-need.html

…I’m left with no other alternative but to build an all-new prototype - just not in SML, which is also GC-based. It’s another reason for my interest in Rust, and the appearance of its second compiler.

But I’ve been wrong in the past, so why should this time be any different? It could be that sometime in the middle of the year, if I have a working prototype in a non-GC language, it ends up leaking stack space instead of heap space - darn. Fortunately, there’s another option, courtesy of one Robert Ennals:

https://web.archive.org/web/20130610122750/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.9.9901&rep=rep1&type=pdf

…and this was a working system! But for reasons which I’ve yet to see explained satisfactorily, it went nowhere (then Robert went elsewhere). If all the old experimental versions/branches are still available for GHC, I believe speceval2 contains the actual implementation.

BurningWitness · January 28, 2024, 4:42am

On a tangent, are there any plans to address this? The inability to type-check and properly pack literals is a very weird pain point, and I don’t like the approach of incorrect instances plus RULES pragmas as a solution.

jaror · January 28, 2024, 1:10pm

Personally, I think Template Haskell is the easiest way to address it. You’d write something like:

x = $$(evalTH [|| 1 + 2 + 3 ||])

But another option is to identify a subset of the language that can be evaluated at compile time, like constexpr in C++ or constant evaluation in Rust.

But perhaps it would also be interesting to explore the possibility of having actual guarantees about optimisations that GHC performs. Such that we can simply rely on the existing optimisations to do this evaluation for us.