Laundry list of updates:
Parquet reader
The Parquet reader now reads most Parquet files in the wild.
Plotting everywhere
Open plots on your browser:
ghci> import qualified DataFrame.Display.Web.Plot as Plt
ghci> Plt.plotAllHistograms df >>= Plt.showInDefaultBrowser
Saving plot to: /home/yavinda/plot-chart_guiv1qcX4ooMnhIkd4N9M5vtgrimGxS4GylrmRB7LwqpFL7v1qgxO.html
This also opens the plot in a browser:
Notebook plotting
Terminal plotting
“Gradual-typing”
Thanks to @jhingonjhingon for this work.
ghci> :script dataframe.ghci
ghci> df <- D.readCsv "./data/housing.csv"
ghci> :exposeColumns df
"longitude :: Expr Double"
"latitude :: Expr Double"
"housing_median_age :: Expr Double"
"total_rooms :: Expr Double"
"total_bedrooms :: Expr Maybe Double"
"population :: Expr Double"
"households :: Expr Double"
"median_income :: Expr Double"
"median_house_value :: Expr Double"
"ocean_proximity :: Expr Text"
ghci> df |> D.derive "some_feature" (total_rooms / households) |> D.take 5
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
index | longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | some_feature
------|-----------|----------|--------------------|-------------|----------------|------------|------------|--------------------|--------------------|-----------------|-------------------
Int | Double | Double | Double | Double | Maybe Double | Double | Double | Double | Double | Text | Double
------|-----------|----------|--------------------|-------------|----------------|------------|------------|--------------------|--------------------|-----------------|-------------------
0 | -122.23 | 37.88 | 41.0 | 880.0 | Just 129.0 | 322.0 | 126.0 | 8.3252 | 452600.0 | NEAR BAY | 6.984126984126984
1 | -122.22 | 37.86 | 21.0 | 7099.0 | Just 1106.0 | 2401.0 | 1138.0 | 8.3014 | 358500.0 | NEAR BAY | 6.238137082601054
2 | -122.24 | 37.85 | 52.0 | 1467.0 | Just 190.0 | 496.0 | 177.0 | 7.2574 | 352100.0 | NEAR BAY | 8.288135593220339
3 | -122.25 | 37.85 | 52.0 | 1274.0 | Just 235.0 | 558.0 | 219.0 | 5.6431000000000004 | 341300.0 | NEAR BAY | 5.8173515981735155
4 | -122.25 | 37.85 | 52.0 | 1627.0 | Just 280.0 | 565.0 | 259.0 | 3.8462 | 342200.0 | NEAR BAY | 6.281853281853282
ghci> df |> D.derive "some_feature" (total_bedrooms / households) |> D.take 5
<interactive>:12:49: error:
• Couldn't match type ‘Double’ with ‘Maybe Double’
Expected: Expr (Maybe Double)
Actual: Expr Double
• In the second argument of ‘(/)’, namely ‘households’
In the second argument of ‘derive’, namely
‘(total_bedrooms / households)’
In the second argument of ‘(|>)’, namely
‘derive "some_feature" (total_bedrooms / households)’
SelectBy
Add new selectBy function which subsume all the other select functions. Specifically we can:
-
selectBy [byName "x"] df: normal select. -
selectBy [byProperty isNumeric] df: all columns with a given property. -
selectBy [byNameProperty (T.isPrefixOf "weight"))] df: select by column name predicate. -
selectBy [byIndexRange (0, 5)] df: picks the first size columns. -
selectBy [byTextRange ("a", "c")] df: select names within a range.
Misc
- Smaller binary size from reduced dependencies (thanks to @metapho-re)


