What’s needed to bootstrap GHC with hugs?

Thanks linj, it’s pretty clear from the discussion above Hugs was never used to bootstrap GHC.

Also note that there is no need to have a Haskell compiler to run MicroHs. All you need is a C compiler, and MicroHs can bootstrap, given the included combinator file.

Hugs won’t know anything about those combinators. (Unless, I suppose, you create an interpreter for them.) And anyway we never got to the bottom of whether Hugs is genuinely bootstrappable from C++.

And most C compilers have been compiled from older versions of themselves.


@linj, my advice would be to find the latest version of GHC:

  • that can be compiled by MicroHs,
  • and can compile MicroHs.

Then you or anyone else can use the mitigation technique Yrjan Skrimstad describes in his thesis. The resulting version of GHC can then be used to start the (long) sequence of compilations needed to arrive at the current version of GHC.

It is now! The README says

Bootstrapping with Hugs

It is also possible to bootstrap MicroHs using Hugs. That means that MicroHs can be built from scratch in the sense of bootstrappable.org. To compile with Hugs you need a slightly patched version of Hugs and also the hugs branch of MicroHs.

So now just build GHC with MicroHS and we are done :slight_smile:

9 Likes

…once you’ve bootstrapped:

  • your CPU’s microcode,
  • your computer’s firmware (or “BIOS” ),
  • the firmware of all the devices in your computer;

and all other opaque binaries, wherever they’re lurking!

That’s an impressive (and scary [**]) set of ‘always enabled extensions’; especially considering Lennart’s complaints too many packages rely on ghc-specific features. More than that lot?

[**] IncoherentInstances always on – are you sure? I suppose the same behaviour, so same bugs, as ghc for OverlappingInstances+FunctionalDependencies.

But not (presumably) Hugs as I know it with TRex.

Incoherent instances is on because it’s too tedious to check for the condition. I’m lazy.

4 Likes

The reason the extension list is long and growing is that I’d like to compile Hackage packages with MicroHs. And I’ll never convince package authors to remove extensions.

And the extensions are always on because it adds a lot of complexity to have them optional.

7 Likes

Thanks Lennart (x 2, yes your first was rather ‘open to interpretation’).

It’s your pet project, so of course be as non-tedious as you choose. To bootstrap ghc+supporting packages I’m sure you don’t need IncoherentInstances nor its scary friends. Once you have ghc, can’t you lean on that for the option-handling?

I’m worried the complexity you’re saving is not just option-handling but sanity-checking of instance coherence and overlap. Ghc is already bad enough; is MicroHS worse in allowing nonsense? (See trac #10675 and links.)

I’m not seeing the merit of a compiler/supported language that slavishly follows ghc and its warts and ugliness.

If you want to contribute instance checking, then I will consider adding it. I have no plans to do it. There much more fun things to do. :slightly_smiling_face:

3 Likes

Research/experimentation on instance coherence seems to have just run out of steam around 2014, not having arrived at any principled design.

I’ve written up a principled approach https://gitlab.haskell.org/ghc/ghc/-/wikis/Functional-dependencies-in-GHC/AntC-proposal. And then it turned out to be not too hard to implement in Hugs. (See hugs-users mailing list August 2022.) Indeed Hugs was so friendly to work with, I went on to a prototype for my earlier idea of ‘Apartness guards’ (mentioned in that FunDeps proposal).

I was hoping MicroHS might be as amenable to language experimentation as Hugs.

(And apologies I’m not providing my usual plethora of links: I’m travelling, so pecking on a phone.)

P.S. does “fun things to do” include a stand-alone/anonymous records system? Hugs/TRex integrates so sweetly with what I’ve now achieved on FunDeps/overlaps, I’ve more or less abandoned ghc.

At the top of my list is implementing enough features to compile useful packages from Hackage. Some of that is even fun. :slightly_smiling_face:
I’m not sure the MicroHs type checker is great for experimentation. It needs a rewrite.
Anonymous records would be fun but it’s not on my list. Not that there really is a list.

6 Likes

Using the hugs → nhc98 approach, I got a bit further by using GCC 2.95.2 with nhc98-1.16 sources (from an archive.org mirror of york.ac.uk ftp) in a Debian Potato chroot:

derptop:~/nhc98-1.16# gcc --version
2.95.2

The compiled hmake-PRAGMA, MkConfig, and MkProg binaries run without segfaulting:

MkConfig

derptop:~/nhc98-1.16# ./lib/x86_64-Linux/MkConfig lib/x86_64-Linux/hmakerc list
Global config file is:
lib/x86_64-Linux/hmakerc
Known compilers:
/root/nhc98-1.16/script/nhc98       (v1.16)
Default compiler:
/root/nhc98-1.16/script/nhc98

MkProg

derptop:~/nhc98-1.16# ./lib/x86_64-Linux/MkProg -h
Usage: MkProg [-q] [-dobjdir] [-g] [-M] target ...
[must have at least one target]

I don’t know how to use MkProg so just run and see it spit out errors:

derptop:~/nhc98-1.16# ./lib/x86_64-Linux/MkProg -Isrc/compiler98 src/compiler98/Main
Can't find module IO in user directories
.
src/compiler98
Or in installed libraries/packages at

Asked for by: src/compiler98/Main.hs
Fix using the -I, -P, or -package flags.

hmake-PRAGMA

No idea what to do here:

derptop:~/nhc98-1.16# ./lib/x86_64-Linux/hmake-PRAGMA

Trying to get tag from unevaluated node in TABLESWITCH at 08056288!
Node is:
f7e391e4 at f7e390d8

Other

  1. Files look like this:
derptop:~/nhc98-1.16# ls lib/x86_64-Linux/
MkConfig  Older      Runtime.a  hmake-PRAGMA  main.o     mutlib.o
MkProg    Prelude.a  config     hmakerc       mutator.o  nhc98heap
derptop:~/nhc98-1.16# file ./lib/x86_64-Linux/MkProg
./lib/x86_64-Linux/MkProg: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped

derptop:~/nhc98-1.16# file ./lib/x86_64-Linux/MkConfig
./lib/x86_64-Linux/MkConfig: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), stripped

derptop:~/nhc98-1.16# file ./lib/x86_64-Linux/hmake-PRAGMA
./lib/x86_64-Linux/hmake-PRAGMA: ELF 32-bit LSB executable, Intel 80386, version 1, dynamically linked (uses shared libs), not stripped
  1. The x86_64-Linux is just a remnant from some attempt at building where I managed to run ./configure - it is just cosmetic.

So where to go from here?

Anything worth pursuing here? I’m not sure I could reproduce everything I’ve done to get these bits working, but with non-segfaulting binaries, what would be next? I’m 100% new to the Haskell world.

2 Likes

Nice progress!

I would first try to clearly document this progress in a reproducable way, so that when someone picks this task up later they can. A nix derivation pinning everything is great, but a shell script and the patches you use would also be useful already.

Then the next step is to understand how to use nhc98 to build small Haskell programs (which commands to invoke etc). And then try to apply this to some (old) version of GHC. Best of luck! :slight_smile:

Sorry to go silent here. Haven’t had much time with work to come back to this.

I’ve setup a repo to reproduce a build environment here: GitHub - jamonation/nhc98-on-hugs. It appears to build using Woody, which is a bit nicer to work with.

I also made a container image from the chroot to play around with - it has made iterating a bit easier. But obviously given the topic and audience here, just use it for experimenting!

docker run -it ghcr.io/jamonation/nhc98-and-hugs:1.22

Sources live in /usr/src and /README.md is also included.

Having a go at building compiler98 I first run into a missing hmakerc error:

derptop:/usr/src/nhc98-1.22/src/compiler98# NHC98COMP=${NHCDIR}/hugs-nhc make HC=${NHCDIR}/script/nhc98
mkdir -p /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98
mkdir -p /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/Derive
mkdir -p /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/Parse
mkdir -p /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/Type
mkdir -p /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/Util
rm -f /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/nhc98 /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/hbc /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/ghc*
#make cleanC
touch "/usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98/gcc"
/usr/src/nhc98-1.22/script/hmake -hc=/usr/src/nhc98-1.22/script/nhc98 -H16M -K2M +CTS -H16M -CTS -package containers -package filepath -package packedstring -package base  -d /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98 MainNhc98
Config file /usr/src/nhc98-1.22/lib/x86_64-Linux/hmakerc does not exist.
  Try running 'hmake-config new' first.
Stop - hmake dependency error.

So here’s one that I cobbled together after repeatedly trying MkConfig. I’ve no idea how much of it is required, but it silences the error, and seems to work in the next step:

derptop:/usr/src/nhc98-1.22/src/compiler98# cat ../../lib/x86_64-Linux/hmakerc
HmakeConfig
  { defaultCompiler = "/usr/src/nhc98-1.22/script/nhc98"
  , knownCompilers =
    [ CompilerConfig
      { compilerStyle = nhc98
      , compilerPath = "/usr/src/nhc98-1.22/script/nhc98"
      , compilerVersion = "/usr/src/nhc98-1.22/lib/x86_64/config:"
      , includePaths = ["/usr/src/nhc98-1.22/include"]
      , cppSymbols = ["__NHC__=981"]
      , extraCompilerFlags = []
      , isHaskell98 = True
      }
    , DynCompiler { compilerPath = "lib/x86_64-Linux/cpphs" }
    , DynCompiler { compilerPath = "lib/x86_64-Linux/runhs" }
    ]
  }

Try a build again, and we’re now missing some includes.

derptop:/usr/src/nhc98-1.22/src/compiler98# NHC98COMP=${NHCDIR}/hugs-nhc make HC=${NHCDIR}/script/nhc98

/usr/src/nhc98-1.22/script/hmake -hc=/usr/src/nhc98-1.22/script/nhc98 -H16M -K2M +CTS -H16M -CTS -package containers -package filepath -package packedstring -package base  -d /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98 MainNhc98

Warning: package(s) base, packedstring, filepath, containers not available in /usr/src/nhc98-1.22/include
Can't find module Foreign in user directories
	.
  Or in installed libraries/packages at
	/usr/src/nhc98-1.22/include
  Asked for by: Error.hs
  Fix using the -I, -P, or -package flags.

Stop - hmake dependency error.

Edit the hmakerc includes path. Repeat until the errors about missing Data.PackedString and Data.Set, go away:

HmakeConfig
  { defaultCompiler = "/usr/src/nhc98-1.22/script/nhc98"
  , knownCompilers =
    [ CompilerConfig
      { compilerStyle = nhc98
      , compilerPath = "/usr/src/nhc98-1.22/script/nhc98"
      , compilerVersion = "/usr/src/nhc98-1.22/lib/x86_64/config:"
      , includePaths = ["/usr/src/nhc98-1.22/include", "/usr/src/nhc98-1.22/include/packages/base", "/usr/src/nhc98-1.22/include/packages/packedstring", "/usr/src/nhc98-1.22/include/packages/containers"]
      , cppSymbols = ["__NHC__=981"]
      , extraCompilerFlags = []
      , isHaskell98 = True
      }
    , DynCompiler { compilerPath = "lib/x86_64-Linux/cpphs" }
    , DynCompiler { compilerPath = "lib/x86_64-Linux/runhs" }
    ]
  }

Build again, and now we’re getting somewhere:

derptop:/usr/src/nhc98-1.22/src/compiler98# NHC98COMP=${NHCDIR}/hugs-nhc make HC=${NHCDIR}/script/nhc98
/usr/src/nhc98-1.22/script/hmake -hc=/usr/src/nhc98-1.22/script/nhc98 -H16M -K2M +CTS -H16M -CTS -package containers -package filepath -package packedstring -package base  -d /usr/src/nhc98-1.22/targets/x86_64-Linux/obj/compiler98 MainNhc98
The program ran out of heap memory.  (Current heapsize is 400000 bytes.)
You can set a bigger size with e.g. +RTS -H4M -RTS (4M = four megabytes).
GC stats:
    Only 32 words after gc, need 32 words.
  Used  2031223 words of heap.  Moved 33296188 words of heap in 303 gcs.
  32 words to next gc.  Max live after gc: 116297 words.
Insufficient heap memory.
Stop - hmake dependency error.
make: *** [/usr/src/nhc98-1.22/lib/x86_64-Linux/nhc98comp] Error 255

We’ve seen this before with the CTypes source file being too large.

Got a bit further with the compiler98 tree by skipping the Makefile and running things directly:

I also split TokenId.hs to get around the Control stack overflow errors it throws because it is so large, e.g.:

/usr/src/nhc98-1.22/script/nhc98   -package containers -package filepath -package packedstring -package base   -c  -o /usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98/TokenId.o TokenId.hs
runhugs: Error occurred

ERROR - Control stack overflow

Breaking that up into TokenIdBuiltins1.hs and TokenIdBuiltins2.hs, and a final TokenIdType.hs that TokenId.hs then imports seems to work:

bash-2.05a# /usr/src/nhc98-1.22/script/nhc98 +RTS -H1024M -K1024M -RTS -v -c -o /usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98/TokenIdBuiltins1.o TokenIdBuiltins1.hs
/usr/src/nhc98-1.22/hugs-nhc +RTS -H1024M -K1024M -RTS -P/usr/src/nhc98-1.22/include/packages/base -I. -P/usr/src/nhc98-1.22/include ./TokenIdBuiltins1.hs ./TokenIdBuiltins1.hs /tmp/TokenIdBuiltins1.1289604.hi /tmp/TokenIdBuiltins1.1289604.hc
rm /tmp/TokenIdBuiltins1.1289604.hi
gcc -D__NHC__=122 -x c -S -DLOW_BYTE_FIRST -I/usr/src/nhc98-1.22/include/ /tmp/TokenIdBuiltins1.1289604.hc -o /tmp/TokenIdBuiltins1.1289604.s
rm /tmp/TokenIdBuiltins1.1289604.hc
gcc -D__NHC__=122 -c -o /usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98/TokenIdBuiltins1.o /tmp/TokenIdBuiltins1.1289604.s
rm /tmp/TokenIdBuiltins1.1289604.s

There’s a .o file now:

bash-2.05a# file /usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98/TokenIdBuiltins1.o
/usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98/TokenIdBuiltins1.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

And the full set:

bash-2.05a# ls -alh /usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98 |grep Token
-rw-r--r--    1 root     root         9.8k Jul  5 03:26 TokenId.o
-rw-r--r--    1 root     root          28k Jul  5 03:27 TokenIdBuiltins1.o
-rw-r--r--    1 root     root          34k Jul  5 03:09 TokenIdBuiltins2.o
-rw-r--r--    1 root     root         7.3k Jul  5 03:09 TokenIdType.o
-rw-r--r--    1 root     root          13k Jul  5 02:09 TokenInt.o

Now I’m stuck on Flags.hs:

bash-2.05a# /usr/src/nhc98-1.22/script/nhc98 +RTS -H2048M -K2048M -RTS -package filepath  -v -c -o /usr/src/nhc98-1.22/build/x86_64-Linux/obj/compiler98/Flags.o Flags.hs
/usr/src/nhc98-1.22/hugs-nhc +RTS -H2048M -K2048M -RTS -P/usr/src/nhc98-1.22/include/packages/base -P/usr/src/nhc98-1.22/include/packages/filepath -I. -P/usr/src/nhc98-1.22/include ./Flags.hs ./Flags.hs /tmp/Flags.1290774.hi /tmp/Flags.1290774.hc
runhugs: Error occurred

ERROR - Garbage collection fails to reclaim sufficient space

So it looks like it needs a refactor as well?

1 Like