DataFrame 0.3.1.0

mchav · September 5, 2025, 5:46am

Laundry list of updates:

Parquet reader

The Parquet reader now reads most Parquet files in the wild.

Plotting everywhere

Open plots on your browser:

ghci> import qualified DataFrame.Display.Web.Plot as Plt
ghci> Plt.plotAllHistograms df >>= Plt.showInDefaultBrowser
Saving plot to: /home/yavinda/plot-chart_guiv1qcX4ooMnhIkd4N9M5vtgrimGxS4GylrmRB7LwqpFL7v1qgxO.html

This also opens the plot in a browser:

Notebook plotting

Terminal plotting

“Gradual-typing”

Thanks to @jhingonjhingon for this work.

ghci> :script dataframe.ghci
ghci> df <- D.readCsv "./data/housing.csv"
ghci> :exposeColumns df
"longitude :: Expr Double"
"latitude :: Expr Double"
"housing_median_age :: Expr Double"
"total_rooms :: Expr Double"
"total_bedrooms :: Expr Maybe Double"
"population :: Expr Double"
"households :: Expr Double"
"median_income :: Expr Double"
"median_house_value :: Expr Double"
"ocean_proximity :: Expr Text"
ghci> df |> D.derive "some_feature" (total_rooms / households) |> D.take 5
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
index | longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households |   median_income    | median_house_value | ocean_proximity |    some_feature
------|-----------|----------|--------------------|-------------|----------------|------------|------------|--------------------|--------------------|-----------------|-------------------
 Int  |  Double   |  Double  |       Double       |   Double    |  Maybe Double  |   Double   |   Double   |       Double       |       Double       |      Text       |       Double
------|-----------|----------|--------------------|-------------|----------------|------------|------------|--------------------|--------------------|-----------------|-------------------
0     | -122.23   | 37.88    | 41.0               | 880.0       | Just 129.0     | 322.0      | 126.0      | 8.3252             | 452600.0           | NEAR BAY        | 6.984126984126984
1     | -122.22   | 37.86    | 21.0               | 7099.0      | Just 1106.0    | 2401.0     | 1138.0     | 8.3014             | 358500.0           | NEAR BAY        | 6.238137082601054
2     | -122.24   | 37.85    | 52.0               | 1467.0      | Just 190.0     | 496.0      | 177.0      | 7.2574             | 352100.0           | NEAR BAY        | 8.288135593220339
3     | -122.25   | 37.85    | 52.0               | 1274.0      | Just 235.0     | 558.0      | 219.0      | 5.6431000000000004 | 341300.0           | NEAR BAY        | 5.8173515981735155
4     | -122.25   | 37.85    | 52.0               | 1627.0      | Just 280.0     | 565.0      | 259.0      | 3.8462             | 342200.0           | NEAR BAY        | 6.281853281853282
ghci> df |> D.derive "some_feature" (total_bedrooms / households) |> D.take 5
<interactive>:12:49: error:
    • Couldn't match type ‘Double’ with ‘Maybe Double’
      Expected: Expr (Maybe Double)
        Actual: Expr Double
    • In the second argument of ‘(/)’, namely ‘households’
      In the second argument of ‘derive’, namely
        ‘(total_bedrooms / households)’
      In the second argument of ‘(|>)’, namely
        ‘derive "some_feature" (total_bedrooms / households)’

SelectBy

Add new selectBy function which subsume all the other select functions. Specifically we can:

selectBy [byName "x"] df: normal select.
selectBy [byProperty isNumeric] df: all columns with a given property.
selectBy [byNameProperty (T.isPrefixOf "weight"))] df: select by column name predicate.
selectBy [byIndexRange (0, 5)] df: picks the first size columns.
selectBy [byTextRange ("a", "c")] df: select names within a range.

Misc

Smaller binary size from reduced dependencies (thanks to @metapho-re)

LaurentRDC · September 5, 2025, 11:26am

Support for reading Parquet is huuuge! Congrats

Kleidukos · September 5, 2025, 1:06pm

@mchav I really want to commend your commitment to excellence wrt data analysis in Haskell, you are bearing an incredibly important torch.