Doubly indexed map

marcosh · February 9, 2024, 2:28pm

Suppose I have some tabular data

	John	James
January	3	5
February	2	8

where the rows have type Month and the columns have type Name.

I could group the data first by Month and then by Name and use type MonthlyData = Map Month (Map Name Int).
Or I could group them first by Name and then by Month and some something like type MonthlyData' = Map Name (Map Month Int).
Or I could group them by Name and Month and use type MonthlyData'' = Map (Month, Name) Int.

I would like to be able to access both rows, providing a Month, and columns, providing a Name.

Is there in the ecosystem a data structure optimised for that?
If not, would it make sense to have something like

data MonthlyData''' = MonthlyData'''
  { groupByName :: Map Name (Map Month Int)
  , groupByMonth :: Map Month (Map Name Int)
  }

 -- uses groupByName
lookupByName :: Name -> MonthlyData''' -> Maybe (Map Month Int)

 -- uses groupByMonth
lookupByMonth :: Month -> MonthlyData''' -> Maybe (Map Name Int)

insert :: Name -> Month -> MonthlyData''' -> MonthlyData'''

update :: Name -> Month -> Int -> MonthlyData''' -> MonthlyData'''

where the constructor MonthlyData''' is not exposed and insert and update modify both groupByMonth and groupByName?

adamgundry · February 9, 2024, 2:50pm

The ixset-typed package does essentially this. It supports multiple indices and internally it constructs a Map from each key type to the values:

marcosh · February 9, 2024, 3:56pm

that looks really interesting, but in my case the Ord a instance required by Data.IxSet.Typed is quite problematic, since the data I have in the cells are not actually Int and they are not easily ordered

adamgundry · February 9, 2024, 9:20pm

Thinking about it, it may indeed be easiest to roll your own data structure out of simpler components in containers, unordered-containers, array, etc., especially if you have only two dimensions. The best representation choices (e.g. Map vs IntMap vs HashMap vs Array vs Matrix) rather depend on the key/value types, data density, access pattern and security considerations.

AntC2 · February 10, 2024, 11:37am

I’m very puzzled by the way you’ve structured the q:

If your data has more months, that’s merely more rows in the table; but
might your data have more than two names? What would that mean/how would you represent it in the table? (Given there’s usually a limited width.)
What does the John = 3 value ‘mean’? Could there be a James = 3 value?

I’d expect:

A Month column, as you have.
A Name column, with possible values John, James, …
A payload column with an Int.
A lookup would need to index by both Month and Name, to return a cell.

Are you really asking for a way to index by Name (say) and return a pair or vector of cells (Ints)?

silky · February 11, 2024, 9:00am

This is at least mildly related to an earlier quesiton I asked @marcosh - Nice data-structure for grouping?

In that thread, I ended up following basically @tcard 's advice ( Nice data-structure for grouping? - #4 by tcard ) but perhaps @mixphix 's library is more to your interest - Nice data-structure for grouping? - #7 by mixphix

I also think it would be worth a glance at @LaurentRDC 's fancy new javelin library: Data.Series ; but it might not work as you need a dataframe-like thing, instead of a series. But maybe, if we ask nicely, @LaurentRDC will add such a feature

LaurentRDC · February 11, 2024, 1:37pm

Dataframes are definitely in the cards. The only problem is that I don’t have the time or need to work on this right now unfortunately.

In the original post:

I would like to be able to access both rows, providing a Month , and columns, providing a Name .

You can do so with a Series. Assuming that you do not know all the names in advance, in which case you need to store your column values in a Map:

> import Data.Series ( at )
> import qualified Data.Series as Series
> import Data.Map.Strict ( lookup )
> import qualified Data.Map.Strict as Map
>
> :{ let xs = Series.fromList 
            [ ("January", Map.fromList [ ("John", 3::Int)
                                       , ("James", 5) 
                                       ]
            , ("February", Map.fromList [ ("John", 2)
                                        , ("James", 8)
                                        ]
            ]
:}
>  xs
       index | values
       ----- | ------
   'January' | fromList [("John", 3), ("James", 5)]
  'February' | fromList [("John", 2), ("James", 8)]
>
> lookup "James" $ xs `at` "January"
Just 3

Just in terms of lookup, this is O(log n) + O(log m), where n is the length of the index, and m is the number of “columns”, so it’s pretty efficient.

However, inserting and updating a Series is very slow because they are based on arrays. Every update results in an array copy.

If you want to process the data in the columns efficiently, then you might need a true dataframe structure, which is column-oriented (while Series is row-oriented).

Hope this helps

marcosh · February 12, 2024, 11:43am

deep-map looks exactly like what I had in mind. Unfortunately there’s no documentation on Hackage

BurningWitness · February 12, 2024, 12:14pm

If there existed a dictionary with optimal lookup in both directions, databases would use it everywhere. Since one does not appear to exist, every library you’re directed to is merely composing other data structures.

The correct answer, in my mind, is to use optimal data structures directly. Your example is best solved as two dictionaries that are updated in tandem, exactly the way you outlined (although I’d argue for fewer backticks in names).

A broader design point to make here is that while your state can be so tight as to exclude any invalid positions, in a lot of places in the real world this is not practical. What you should strive for is instead that at every point in your application the state remains coherent. You may think dictionaries are “unsafe” specifically because they are full of invalid states, however your problem cannot be solved in any other way, so you might as well cut down on pointless dependencies.

Also here is a link to a Reddit post that links to a video that makes a parallel between databases and dictionaries, you may find it useful.

mixphix · February 12, 2024, 2:37pm

That’s weird, there are definitely Haddocks. Sorry about that. I’m not exactly sure how to get Hackage to run the generator thing.

silky · February 12, 2024, 3:09pm

I think the problem is that it just isn’t building - Hackage: Build #1 for deep-map-0.2.0

mixphix · February 12, 2024, 4:26pm

Thanks for pointing that out! It seems it built with a version of GHC that didn’t support \cases syntax.

Topic		Replies	Views
Nice data-structure for grouping? Learn	10	1325	November 13, 2023
Request for feedback: Haskell implementation of series, or labeled arrays Show and Tell	17	1897	February 2, 2025
[Initial feedback request] DataFrame library Show and Tell	24	1336	February 2, 2025
Columnar storage of datatypes Learn	7	755	January 10, 2024
Type level Indexing, type classes or something else? Show and Tell	8	843	May 17, 2024

Doubly indexed map

Related topics