Aztecs v0.6: A modular game engine and ECS for Haskell

matthunz · February 20, 2025, 2:30am

Hey! I’m a long time lurker, first time poster, really excited to announce my open-source Haskell game engine and ECS: Aztecs.

import Aztecs
import qualified Aztecs.ECS.Access as A
import qualified Aztecs.ECS.Query as Q
import qualified Aztecs.ECS.System as S
import qualified Aztecs.SDL as SDL
import Control.Arrow ((>>>))
import Control.Monad (when)

setup :: System () ()
setup = S.queue . const . A.spawn_ $ bundle Window {windowTitle = "Aztecs"}

update :: Schedule IO () ()
update =
  reader (S.single (Q.fetch @_ @KeyboardInput))
    >>> task
      ( \kb -> do
          when (wasKeyPressed KeyW kb) $ print "Onwards!"
          when (wasKeyPressed KeyS kb) $ print "Retreat..."
          when (wasKeyReleased KeyW kb || wasKeyReleased KeyS kb) $ print "Halt!"
      )

main :: IO ()
main =
  runSchedule_ $
    SDL.setup
      >>> system setup
      >>> forever_ (SDL.update >>> update >>> SDL.draw)

romes · February 20, 2025, 8:48am

Wohoo!!! Welcome to the team of Haskell game engine developers!

I took a quick look at the paper and your library. Very cool! A few comments:

The library’s api looks good and the choice of using arrow is curious! Without having looked at the implementation in depth: why did you decide to use Arrows?
I would have guessed the game engine and ECS library would be separate, so other Haskell game engines can leverage your ECS system without depending on your engine whole. Is the way e.g. the game loop and ECS components are designed that intertwines them enough for separation not to be possible?
The benchmark looks promising! It says it only compares simple mutating queries, so it would be cool to see a more complete comparison e.g. by translating some of bevy’s more complex benchmarks into Aztecs.

Anyway those are just some thoughts off the top of my head. Great to see this project and even more people in the space

matthunz · February 20, 2025, 5:31pm

Thanks so much!! I really appreciate you taking a look

Arrows are awesome! I found them to be really powerful at expressing computations like queries and systems, where it’s much more efficient to have the actual operations known ahead of time. For example, an arrow-based query (rather than the earlier Monad interface) knows which ComponentIDs to search for ahead of time, and can therefore jump directly to groups of components containing them. I’m really excited to see how arrows can combine together to abstract away ECS concepts (such as this monadic interface for loading assets as a generic ArrowSystem )
Since Aztecs now uses pure queries and systems, my goal is to have game logic side-effect free so you can write your game as just input -> state -> state (heavily inspired by Yampa). So the provided components like Window and Camera only store parts of the game state, requiring systems in place to actually produce side-effects. Then something like aztecs-sdl can be used to actually render your game state (and hopefully the components are generic enough to support other backends like OpenGL).
I’m really excited about performance so far, but I’m not 100% convinced of the results just yet (so really sorry if I’m spreading fake news lol). What I’m hoping for long-term is that Aztecs won’t need expensive state management tools like observers and change detection (like Bevy and Flecs use), and games can instead rely on Haskell language features to keep things composable but fast.

Thanks for reading my ramblings I can’t wait to dive deeper into Haskell game dev

wiz · February 21, 2025, 1:55pm

Please add apecs to benchmarks. I tried comparing (in 0.4) and aztecs ended up way slower. Perhaps I was doing something wrong.

matthunz · February 23, 2025, 11:58pm

Thanks for giving it a go I definitely want to add more comparisons, especially to another Haskell ECS like Apecs, when this package gets a little more refined.

Ugh unfortunately I might have been advertising some false positives - I think without the latest use of NFData, the previous benchmarks were just accumulating thunks

The latest results are 2ms for 10,000 entities, which really isn’t ideal. I feel like this architecture in general should be faster for functional languages because you can directly map over groups of components, where as sparse-set approaches (like Apecs) requiring more filtering. I’d expect at least queries on archetypes to be faster than sparse sets, but with slower inserts/removals. I’m still hopeful that queries can be optimized here.

If anyone would be interested in helping track down performance hits, I’m seeing the biggest time sink with profiling is View.unview (which is pretty surprising to me TBH)

romes · February 24, 2025, 9:20am

Could you share your profile? Ideally you have compiled all dependencies with profiling enabled as well and ran with -fprof-late and then the executable with +RTS -pj to get a format viewable in speedscope.app

In cabal you can do this by adding to the cabal.project with:

package *
  profiling: True
  profiling-detail: late

and running the resulting executable with +RTS -pj

simon · February 24, 2025, 10:10am

If you want to reach competitive performance with other state of the art ECS libraries, I think you will need to rethink your whole storage architecture. Here are the things I would keep in mind when working on an ECS.

Data Locality
The speed of ECS is based on data locality. This means that data which is accessed in a query should be next to each other. Then, L1/L2 caches can be optimally used and memory access is very fast.
In Haskell terms, this means you may not store datatypes in a regular fashion, since they are allocated on the heap in a non-linear manner. Tree-based storage containers like IntMap accelerate this problem. I think it is likely that your current storage spreads the stored items over the whole heap.

The simplest way to fix this is to allocate a big memory buffer and store elements via a Storable instance next to each other. Of course, there are caveats: You need to handle low-level code, not everything can be made Storable, arrays do not have infinite size, …

Optimize the query loops

You need to make sure to optimize the looping over all elements in a query since they are the hot path. You must not allocate anything within the loop and all storage functions should be inlined.

Let’s see this in an example:

for [1..elementsAmount] $ \i -> do
  element <- getElement i
  element' <- doStuff element
  putElement i element'

In the above code, getElement and putElement are ideally inlined so that accessing the data is fast. Additionally, for [1..elementsAmount] will most likely result in a simple for loop without allocating any indices, though here some benchmarking might be needed to verify GHC code generation works as expected.

In contrast, your storage DynamicStorage is based on a record of functions. While this is very nice for composability, those functions are probably not inlined and will be slower as inlined versions as a result. Keep in mind, each call to a non-inlined function is an indirection. Additionally, entitiesDyn' might just allocate all indices, tanking the performance.

Performance and Composability are often opposite goals and you will have to find a path in-between.

wiz · February 24, 2025, 10:30am

I doubt that. IntMaps used by apecs are already pretty hard to beat and that’s without further caching/indexing.

matthunz · February 24, 2025, 10:14pm

For sure here’s the output I got:

gist.github.com

https://gist.github.com/matthunz/a187031286e66e3cbcc75f87c2f0d0b9

aztecs-bench.prof

{
"program": "aztecs-bench",
"arguments": ["/home/matt/Documents/aztecs/dist-newstyle/build/x86_64-linux/ghc-9.10.1/aztecs-0.7.0/b/aztecs-bench/build/aztecs-bench/aztecs-bench", "iter", "exes"],
"rts_arguments": ["-pj"],
"end_time": "Mon Feb 24 17:13 2025",
"initial_capabilities": 0,
"total_time":       10.34,
"total_ticks": 10339,
"tick_interval": 1000,
"total_alloc":50298465544,

This file has been truncated. show original

From the flamegraph it looks like reading and writing is taking roughly the same time
https://gist.githubusercontent.com/matthunz/14044fc8ffef6abb9ce0dbd1e22fe122/raw/899ef4c1c033711d28f05522c4b2042d0353c7c8/flamegraph.svg

matthunz · February 24, 2025, 10:23pm

Oh wow a low-level approach like that sounds awesome, and really ideal in a lot of cases I’d be very interested in a module like Aztecs.ECS.Storable with higher-performance, but lower-level components.

Just thinking about it now I guess most components could be Storable but I do think a higher-level API also makes sense when they can’t be. I tried to make query interfaces generic enough to support other storage structures (albeit there might be some re-arranging to be done to support Storable).

In general the current component storages definitely need an update. The current IntMap should really just be a sorted list, where the EntityIDs are stored in the actual Archetype (since they’re shared across all the components in this case). I still think a dynamic approach with archetypes is good to keep in, but I’d really like to export a lower-level Storable approach (and maybe a higher-level typed approach)

matthunz · February 27, 2025, 7:21pm

Just wanted to share the benchmark results are in a much better state after some inlining and simpler component storage, but there’s definitely more to optimize

benchmarking iter
time                 288.8 μs   (287.9 μs .. 289.4 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 286.6 μs   (285.9 μs .. 287.3 μs)
std dev              2.383 μs   (2.076 μs .. 2.828 μs)

benchmarking iter system
time                 289.6 μs   (289.0 μs .. 290.3 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 289.9 μs   (289.4 μs .. 290.7 μs)
std dev              1.972 μs   (1.558 μs .. 2.647 μs)

Thanks a ton for your insight @simon I’m really hoping to dig deeper into the lower-level ideas you brought up too

mpickering · February 28, 2025, 10:59am

You can load the profile into speedscope.app for a nicer browsing experience.

This is what we use when profiling issues in GHC.

matthunz · March 7, 2025, 7:48pm

ooo that looks super helpful thanks!

I’ve been messing around it and I definitely think I’ve tracked at least some things down, just still having some trouble with inlining. It looks like reads and writes take roughly similar times, which I guess makes sense since they’re both just iterating over a list of components I’m really curious if a stream processing library like Streamly might help here.

benchmarking iter
time                 81.70 μs   (81.67 μs .. 81.73 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 81.56 μs   (81.51 μs .. 81.60 μs)
std dev              150.9 ns   (117.5 ns .. 223.3 ns)

matthunz · March 7, 2025, 7:54pm

Also I’m psyched to share share v0.11 GitHub - aztecs-hs/aztecs: A modular game engine and ECS for Haskell

Queries now use an applicative DSL and systems are now monadic, enabling a much simpler API. This release replaces the Arrow and class-based interface with simple transformers that can be interpreted in a variety of ways. For example, a System can now be run in a single thread with Access.system, or concurrently with Access.concurrently.

import Aztecs.ECS
import Control.DeepSeq
import Control.Monad
import Control.Monad.IO.Class
import Data.Function
import GHC.Generics

newtype Position = Position Int deriving (Show, Generic, NFData)

instance Component Position

newtype Velocity = Velocity Int deriving (Show, Generic, NFData)

instance Component Velocity

move :: (Monad m) => QueryT m Position
move = fetch & zipFetchMap (\(Velocity v) (Position p) -> Position $ p + v)

run :: SystemT IO ()
run = do
  positions <- query move
  liftIO $ print positions

app :: AccessT IO ()
app = do
  _ <- spawn $ bundle (Position 0) <> bundle (Velocity 1)
  forever $ system run

main :: IO ()
main = runAccessT_ app

matthunz · March 11, 2025, 3:32am

Also wanted to share a new benchmark I just did against Apecs @wiz

Still way slower than Bevy, but I’m feeling a little better about the archetype approach vs sparse sets. One huge benefit I realized is copying components to concurrent tasks should be a lot faster with archetypes, since smaller groups of components have to be moved.

Aztecs:

q :: Query Position
q = fetch & zipFetchMap (\(Velocity v) (Position p) -> Position $ p + v)

Apecs:

q :: System PosVel ()
q = cmap $ \(ECSVel v, ECSPos p) -> ECSPos (p+v)

wiz · March 13, 2025, 4:05pm

Neat. Is there a source for the setup somewhere?

matthunz · March 16, 2025, 9:32pm

Hmm actually looking over the Apecs bench again I’m not sure I’m using this correctly The current bench may include inserting components as well, but I can’t figure out how to return the World from a system yet

gist.github.com

https://gist.github.com/matthunz/a662cb8367b178a718f46cc5950b1cc3

apecs.hs

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE Strict #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}

This file has been truncated. show original

For Aztecs it’s just the standard bench of querying two components, and updating one

github.com/aztecs-hs/aztecs

bench/Bench.hs

main

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}

import Aztecs.ECS
import qualified Aztecs.ECS.Query as Q
import Aztecs.ECS.World
import qualified Aztecs.ECS.World as W
import Control.DeepSeq
import Criterion.Main
import Data.Function
import Data.Functor.Identity
import GHC.Generics

newtype Position = Position Int deriving (Show, Generic, NFData)

instance Component Position

newtype Velocity = Velocity Int deriving (Show, Generic, NFData)

This file has been truncated. show original

wiz · March 16, 2025, 9:51pm

You don’t. But you don’t have to, because you pass the world to the runner for the systems to operate on - you already have it.

Init world
Run setup system (Insert stuff) using that world handle
Pass the populated world to the benchmark runner.

If you get the same timings, then the setup system were too lazy… But it should be working.

Anyway, I’m curious about the numbers from the Aztecs that actually include inserts too. Sometimes inserts can be bottlenecks too.

matthunz · March 29, 2025, 8:05pm

Oh… I think I really misunderstood their API That lower-level approach is really interesting… Honestly I think Aztecs could benefit a lot from more direct interfaces to allocation and etc like Apecs (but keeping the more dynamically-typed approach)

I think you make a really good point about inserts, those and probably lookups should really get a bench when I get the time.

I’d also like to throw in issues with ECS queries in Haskell in general. Something I’ve been thinking about a lot lately is how to express complex mutations, where a traditional imperative ECS would use raw pointers (or Rust references) to components.

An approach I’m really interested in here is something like SQL joins, where queries might be able to operate at a lower-level (either on pointers or unsafe Vector indices) underneath. That way updating an entitiy that’s related to another entity can still mutate its components directly Query `join` operation · Issue #41 · aztecs-hs/aztecs · GitHub

Topic		Replies	Views
Help optimizing Aztecs - an ECS now using sparse sets Learn	3	210	April 14, 2025
HaskellCard: Game Engine like HyperCard Show and Tell	0	291	April 10, 2025
Nyx - A Bullet Hell Game in Haskell Show and Tell	15	2394	May 11, 2019
Pixelpusher: An intuitive 2D multiplayer action game in Haskell Show and Tell	13	569	March 14, 2025
Turncoat: An elegant UCI chess engine Announcements	4	502	February 4, 2025

Aztecs v0.6: A modular game engine and ECS for Haskell

Related topics