Doubly indexed map

Suppose I have some tabular data

John James
January 3 5
February 2 8

where the rows have type Month and the columns have type Name.

I could group the data first by Month and then by Name and use type MonthlyData = Map Month (Map Name Int).
Or I could group them first by Name and then by Month and some something like type MonthlyData' = Map Name (Map Month Int).
Or I could group them by Name and Month and use type MonthlyData'' = Map (Month, Name) Int.

I would like to be able to access both rows, providing a Month, and columns, providing a Name.

Is there in the ecosystem a data structure optimised for that?
If not, would it make sense to have something like

data MonthlyData''' = MonthlyData'''
  { groupByName :: Map Name (Map Month Int)
  , groupByMonth :: Map Month (Map Name Int)
  }

 -- uses groupByName
lookupByName :: Name -> MonthlyData''' -> Maybe (Map Month Int)

 -- uses groupByMonth
lookupByMonth :: Month -> MonthlyData''' -> Maybe (Map Name Int)

insert :: Name -> Month -> MonthlyData''' -> MonthlyData'''

update :: Name -> Month -> Int -> MonthlyData''' -> MonthlyData'''

where the constructor MonthlyData''' is not exposed and insert and update modify both groupByMonth and groupByName?

2 Likes

The ixset-typed package does essentially this. It supports multiple indices and internally it constructs a Map from each key type to the values:

6 Likes

that looks really interesting, but in my case the Ord a instance required by Data.IxSet.Typed is quite problematic, since the data I have in the cells are not actually Int and they are not easily ordered

Thinking about it, it may indeed be easiest to roll your own data structure out of simpler components in containers, unordered-containers, array, etc., especially if you have only two dimensions. The best representation choices (e.g. Map vs IntMap vs HashMap vs Array vs Matrix) rather depend on the key/value types, data density, access pattern and security considerations.

1 Like

Iā€™m very puzzled by the way youā€™ve structured the q:

  • If your data has more months, thatā€™s merely more rows in the table; but
  • might your data have more than two names? What would that mean/how would you represent it in the table? (Given thereā€™s usually a limited width.)
  • What does the John = 3 value ā€˜meanā€™? Could there be a James = 3 value?

Iā€™d expect:

  • A Month column, as you have.
  • A Name column, with possible values John, James, ā€¦
  • A payload column with an Int.
  • A lookup would need to index by both Month and Name, to return a cell.

Are you really asking for a way to index by Name (say) and return a pair or vector of cells (Ints)?

1 Like

This is at least mildly related to an earlier quesiton I asked @marcosh - Nice data-structure for grouping?

In that thread, I ended up following basically @tcard 's advice ( Nice data-structure for grouping? - #4 by tcard ) but perhaps @mixphix 's library is more to your interest - Nice data-structure for grouping? - #7 by mixphix

I also think it would be worth a glance at @LaurentRDC 's fancy new javelin library: Data.Series ; but it might not work as you need a dataframe-like thing, instead of a series. But maybe, if we ask nicely, @LaurentRDC will add such a feature :smiley:

3 Likes

Dataframes are definitely in the cards. The only problem is that I donā€™t have the time or need to work on this right now unfortunately.

In the original post:

I would like to be able to access both rows, providing a Month , and columns, providing a Name .

You can do so with a Series. Assuming that you do not know all the names in advance, in which case you need to store your column values in a Map:

> import Data.Series ( at )
> import qualified Data.Series as Series
> import Data.Map.Strict ( lookup )
> import qualified Data.Map.Strict as Map
>
> :{ let xs = Series.fromList 
            [ ("January", Map.fromList [ ("John", 3::Int)
                                       , ("James", 5) 
                                       ]
            , ("February", Map.fromList [ ("John", 2)
                                        , ("James", 8)
                                        ]
            ]
:}
>  xs
       index | values
       ----- | ------
   'January' | fromList [("John", 3), ("James", 5)]
  'February' | fromList [("John", 2), ("James", 8)]
>
> lookup "James" $ xs `at` "January"
Just 3

Just in terms of lookup, this is O(log n) + O(log m), where n is the length of the index, and m is the number of ā€œcolumnsā€, so itā€™s pretty efficient.

However, inserting and updating a Series is very slow because they are based on arrays. Every update results in an array copy.

If you want to process the data in the columns efficiently, then you might need a true dataframe structure, which is column-oriented (while Series is row-oriented).

Hope this helps

2 Likes

deep-map looks exactly like what I had in mind. Unfortunately thereā€™s no documentation on Hackage

If there existed a dictionary with optimal lookup in both directions, databases would use it everywhere. Since one does not appear to exist, every library youā€™re directed to is merely composing other data structures.

The correct answer, in my mind, is to use optimal data structures directly. Your example is best solved as two dictionaries that are updated in tandem, exactly the way you outlined (although Iā€™d argue for fewer backticks in names).


A broader design point to make here is that while your state can be so tight as to exclude any invalid positions, in a lot of places in the real world this is not practical. What you should strive for is instead that at every point in your application the state remains coherent. You may think dictionaries are ā€œunsafeā€ specifically because they are full of invalid states, however your problem cannot be solved in any other way, so you might as well cut down on pointless dependencies.

Also here is a link to a Reddit post that links to a video that makes a parallel between databases and dictionaries, you may find it useful.

Thatā€™s weird, there are definitely Haddocks. Sorry about that. Iā€™m not exactly sure how to get Hackage to run the generator thing.

I think the problem is that it just isnā€™t building - Hackage: Build #1 for deep-map-0.2.0

Thanks for pointing that out! It seems it built with a version of GHC that didnā€™t support \cases syntax.