Interesting that you think it will be easier to bootstrap via Hugs rather than MicroHs, since MicroHs implements many more of the extensions you need. Why do you think Hugs would be better?
I didn’t realize you could bootstrap MicroHs via Hugs. That’s likely the most productive route.
I’ve been wondering about something since I first heard about the thought of bootstrapping GHC via Hugs:
I’ve heard Core described as a simplified Haskell, so I wonder if it would in concept be possible to translate Core back into (simplified) Haskell–not the original Haskell from which the Core was derived, but rather much simpler Haskell code (e.g., with typeclasses replaced with explicit dictionary passing). Perhaps this could be described as the simplest Haskell that would compile down to the same Core (other than naming issues, etc.).
If so, then presumably this simplified code could be compiled with Hugs (setting aside for the moment potential issues with file size and things like that) or MicroHs. That would provide a bootstrapping pathway whereby GHC is used to “transpile” GHC through Core and into simplified Haskell, would could be persisted and also compiled by Hugs/MicroHs into a working GHC for bootstrapping.
I know that’s easier said than done, but I wonder if it’s in concept possible? Are the semantics of Core incompatible with doing this? I know that Core is type-annotated but I thought that Core could be re-type-checked (as done by Core Lint), which would seems to imply that it carries sufficient typing independent of the explicit type information. I had thought at one point that I’d concluded that there was some additional blocker but now I can’t think of what it would be.
That should be possible, but it doesn’t sound easier than just parsing Haskell to GHC’s AST and then programmatically refactoring away the parts that Hugs/MicroHs can’t handle. That said I don’t actually know which parts those are, so if there are a lot of missing features then reusing GHC’s desugaring could save some effort.
There’s also the question of where we draw the line between readable source code and unreadable blobs of generated code. Otherwise we could just take the C code that GHC can already generate (that is how people port GHC to new platforms).
I would think it would be easier than programmatically refactoring from GHC’s AST because doing that would require recreating the code to understand and desugar each language extension (and typecheck everything first), which is arguably almost everything GHC does to get to Core. Also, what I was suggesting would be future proof, in that new language features/extensions wouldn’t require updating anything to keep it working. (It’s true though that it would only need to handle the language extensions the GHC uses and Hugs/MicroHs don’t support, but I don’t know how much easier that makes it.) And ending up with something potentially even simpler than Haskell 98 seems appealing.
But as you said, compilation targeting C would do the job just as well. I had thought that feature no longer worked, but even if so then fixing it might be the most productive use of effort. I don’t know if it matters if it’s readable as long as it’s architecture/platform independent.
While that could also be an interesting exercise, would it get us any closer to bootstrapping GHC, i.e. to compile GHC without relying on having an already-compiled GHC from the beginning?
If I understand correctly, as current GHC source code uses as-of-now-GHC-only features, you’d need a recent-enough already-compiled GHC to generate the core code, that – in your suggested approach – would then be de-compiled in a further step. So the step of compiling to core (unless done by a bootstrapped compiler) cannot be part of a bootstrapping process.
One could argue that, as the de-compiled GHC source code would be (somewhat) human-readable, the compilation-to-core-and-decompiling steps are just preparation and you’re then bootstrapping GHC from the resulting source. But that wouldn’t be bootstrapping actual GHC, that’d be bootstrapping a variation of GHC that has the result of said de-compilation as its source code. This variation might or might not be functionally equivalent to actual GHC. In theory, the code of the variation (the result of the de-compilation) could be reviewed to make sure it faithfully corresponds to actual GHC, but in practice that’s most probably infeasible. (And if you want to automate it with automatic reasoning, the tools for that would in turn need to be properly bootstrapped to be allowable.)
Following that line of reasoning, even if we had an audited bootstrapped Haskell compiler, how feasible would it be to confirm the current GHC sources actually do what we hope they do? Automated theorem proving is completely out of the question, as formally verified compilers are still huge academic projects.
I hope that most GHC code is somewhat reviewed. For the compile-to-core-then-decompile approach to have any merit, we’d have to review the de-compilation result to at least the same level of scrutiny.
But yeah, if not even the source code can be trusted, the whole trusting-trust worry is kinda moot.
From my perspective, bootstrapping means getting a running GHC without already having a running GHC in hand, not getting a running GHC without a running GHC ever having existed. The point would be that you’d generate it today, when the ecosystem is alive, and save it for the future.
For instance, let’s say GHC fell into disuse for several years, such that existing binaries wouldn’t run on any current OS, but someone wanted to pick it back up. If they had (for instance) the compiled-to-C-source version, then all they need is a working C compiler. Even today, I could install GHC from only sources (including this source) and a C compiler, not needing to download an already-compiled GHC. That could be advantageous in some situations.
Also, if we’d been generating and saving such source from the beginning, then ideally that would mean that I could get a running GHC 4 (for example) today, which I think I can’t if it can’t be compiled with current GHC–I’d need something like GHC 3 binaries and, if they could be found, they probably don’t run on today’s machines.
This variation might or might not be functionally equivalent to actual GHC.
This is standard bootstrapping, whereby the first get this monstrosity working, call it GHC-1, run GHC-1 against the GHC sources to produce GHC-2 (that is, a running GHC produced by using a somewhat sketchy Haskell compiler to compile actual Haskell code), and run GHC-2 against the GHC source to produce GHC-3 (a GHC compiled by a more legitimate GHC). This is just how you bootstrap compilers, as I understand it.
Again you’d validate this today–you’d check that the result of this is the same as what you’d get via the standard route. You don’t do this via source inspection, you do this possibly via tests or ideally you’d see that you get the exact same binary as you get via the normal route.
To me, this is about preparing today for a circumstance in the future.
I just installed GHC 4.08.2 on my machine using Debian 3 and QEMU. It wasn’t that hard.
Some notes:
- https://downloads.haskell.org/~ghc/4.08.2/ghc-4.08.2-i386-unknown-linux.tar.bz2
- https://cdimage.debian.org/mirror/cdimage/archive/3.0_r6/i386/iso-dvd/debian-30r6-dvd-i386-binary-1.iso
- I followed this tutorial: Installing Ubuntu on QEMU: A Comprehensive Guide — linuxvox.com mutatis mutandis
- I copied the GHC files onto the qcow2 image by mounting it using qemu-nbd
- Then just navigate to those files and run
./configure && make install(as superuser)
Interesting! I just picked that version number arbitrarily, but it’s nice to know that it’s possible.
The conclusion way back up this thread is that Hugs was never used to build GHC. (So the topic title is by now misleading. Anybody could be forgiven for missing that conclusion.)
The purpose here was to get a ‘clean’ GHC without any risk of malware getting smuggled from (say) dodgy libraries. It’s at least possible to build Hugs from its C sources, using a known-clean C compiler/install.
I think your “sketchy” approach doesn’t provide strong enough sanity (in either sense of the word
).