Reading S3 hosted files

I have a graphql server running using servant. It is running in a monad stack with state hosted using Reader and STM. Unfortunately, I still consider myself a bit of a newbie with Haskell as I only get to work on it “now and then” (my seemingly short-lived memory does not serve me well :)).

The object store is S3 compliant. There are a few S3 libraries on hackage. The one that seems closest to the need is minio-hs: A MinIO Haskell Library for Amazon S3 compatible cloud storage.. This said, I’m aok with using Network.HTTP.Client.TLS given the single-purpose nature of the task:

  1. set-up a reusable session (manager) that hosts the core configuration and security credentials
  2. generate file-specific requests to instantiate the data structure (json → haskell data structure) used to serve the graphql request
  3. the file name and location are defined ahead of time (I can assert the presence and it being in the required format - scope for other parts of the system)

I’m looking for ways in which others have split the tasks 1 & 2; the spec is tls, server-to-server connection.

Update

I ended up using the Amazonka package. There was a major update (from 1.6 → 2.0; major refactor and/or renaming). The documentation is good but I suspect is not all-complete.

I have the following snippets that I’m hoping to recruit comments regarding the design and whether I’m missing a more idiomatic, maintainable approach.

Questions

  1. Does the following pass the sniff test?
  2. While I’m ok with having the app fail outright during the initialization phase in the event required credentials can’t be found (the use of panic), I haven’t nailed down how to generate the appropriate response in the event the S3 compliant service “goes down”. I’m not fluid with logging and try/catch.

Augmenting the app Env

import ...
import qualified Amazonka             as S3
import qualified Amazonka.S3          as S3
import qualified Amazonka.Auth        as S3

-- |
-- Static and intialized values
data Env = Env
  { -- | database to fodder the graphql request
    database :: !(TVar Database),
    -- | various constants
    config   :: !Config,
    -- | global, reusable session with preconfigured 'operation'
    s3env    :: !S3.Env
  }

-- |
-- Use the app config to instantiate the Env that hosts the S3 Env
-- and 'Session'. Include a pointer to a mutable database
-- used to host the deserialized request body.
mkAppEnv :: MonadIO m => TVar Database -> Config -> m Env
mkAppEnv db cfg = do
    logger <- S3.newLogger S3.Debug stdout

    let s3cfg    = fileShareCfg cfg
        -- region'  = region s3cfg todo: make sure region = US (required for DigitalOcean) 
        s3id     = case S3.fromText $ accessId s3cfg of
                      Left e  -> panic ("Failed s3key: " <> pack e)
                      Right r -> r
        s3secret = case S3.fromText $ secret s3cfg of
                      Left e  -> panic ("Failed secret: " <> pack e)
                      Right r -> r

    credentials :: S3.Env <- S3.newEnv (S3.FromKeys s3id s3secret)

    let s3env' = credentials { S3._envLogger = logger
                             , S3._envRegion = "US"
                             }
        s3env'' = S3.configure (setEndpointInService cfg) s3env'

    pure $ Env { database = db
               , config = cfg
               , s3env = s3env''
               }

-- |
-- Fix to ssl endpoint within the Service
-- see also S3 root configure
setEndpointInService :: Config -> S3.Service
setEndpointInService  cfg = S3.setEndpoint
    True                                            -- ssl
    (encodeUtf8 . hostBucket $ fileShareCfg cfg)    -- hostname
    443                                             -- port
    S3.defaultService                               -- service being updated

1 Like

I’m making progress. The next step is is combining the ResourceT m a with my current app context.

type WithAppContext m =
  ( Typeable m,
    MonadReader Env m,
    MonadIO m,
    MonadCatch m,
    MonadLogger m,
    MonadThrow m
  )

… solved by including the type class constraints MonadResource m and MonadUnliftIO m.

Next, composing the runners… where to put ResourceT in the following:

newtype AppObs a =
    AppObs
        { iniApp :: ReaderT Env (LoggingT Handler) a
        }
    deriving ( Functor
             , Applicative
             , Monad
             , MonadReader Env
             , MonadIO
             , MonadLogger
             , MonadThrow
             , MonadCatch
             -- , MonadError Text Already defined by Servant
             )

-- |
-- Natural transformation used by Servant to return a `Handler a` from the custom monad
nat :: Env -> AppObs a -> Handler a
nat env app = runStderrLoggingT (
                 runReaderT (iniApp app) env)

Before you get too wedded to the idea of using amazonka, beware that setting endpoint[s] interferes with vhost style URLs. I hope to fix this at some point, but it requires rummaging around the guts of the library and I haven’t thought hard about what that should look like yet.

Thank you @jackdk for the FYI and your contributions to this important resource. Fortunately, my use-case is narrow. I have tested the ability to pull the data from the S3-compatible host using the v2 infrastructure.

Last step – evaluate in 2 different monad contexts

Once the app is initialized, which includes setting the amazonka env, there are two phases to serving each request.

  1. Receive the user-request (requires the servant handler context)
  2. Update the in-memory data using data streamed from the S3 object store (does not require the servant handler)
  3. Instantiate and serve the graphql request (servant handler)

I suspect the final answer gets the following to work.

(I get the code to compile with hollow instances of MonadUnliftIO and MonadResource for the servant Handler).

-- AppObs is a custom monad that wraps the servant handler 
-- => a single monad that needs to be "partially" evaluated?? OR
-- => temporarily augment the monad is a nested do-block
--
-- 🔖 The system relies on the sequence that I won't know "is working"
-- until I get the code to run (i.e., the request must only be executed
-- once the db has been updated... with a stream... using conduit)
--
api :: ProjectId -> GQLRequest -> AppObs GQLResponse
api pid req = do
    -- run a side-effect outside the custom Servant Handler
    _ <- setDbWithS3 pid
    interpreter gqlRoot req

Streaming the getObject operation to mutate the in-memory db

-- |
-- Update database with ObsEtlInput retrieved from S3
--
setDbWithS3 :: (MonadCatch m, MonadLogger m, MonadReader Env m, MonadResource m)
          => ProjectId -> m (ConduitT () S3.ByteString (ResourceT IO) ())
setDbWithS3 pid = do
   ... -- access the app env (db mutable ref, cfg and s3env)
   resourceState <- createInternalState
   obsEtl <- flip runInternalState resourceState $ do   -- do :: ResourceT m ObsETL
        awsRes <- request pid cfg s3env'
        rawBytes <- S3.body awsRes `S3.sinkBody` CB.sinkLbs  -- ConduitM ByteString Void (ResourceT IO) a
        tryObsFromResponse rawBytes 

   -- side-effect: mutate db state
   let newStore = dbNew pid obsEtl
   liftIO . atomically $ App.writeTVar dbTVar newStore
   pure $ closeInternalState resourceState  -- this may not guarantee seq

The custom monad and natural transformation

-- not able to host ResourceT?, certainly not UnliftIO
newtype AppObs a =
    AppObs
        { iniApp :: ReaderT Env (LoggingT Handler) a
        }
    deriving newtype ( Functor ... )

-- custom monad -> servant handler
nat :: Env -> AppObs a -> Handler a
nat env app = runStderrLoggingT (
                 runReaderT (iniApp app) env
                 )

--
-- Where the use servant's `enter` might come into play 
-- in the 2-context strategy
--
-- serve :: Proxy Api ->  (ServerT api Handler) -> Application
-- hoistServer :: HasServer api '[]
--             => Proxy api -> (forall x. m x -> n x)
--             -> ServerT api m -> ServerT api n
--
-- 🔖 Requires Servant.Conduit to access ToSourceIO and FromSourceIO
--
app :: Env -> Application
app env = logStdoutDev . cors ( const $ Just corsPolicy )
        . serve apiType $ hoistServer apiType (nat env) appM

This is new territory for me…

I hope that I’m articulating the strategy in a way that makes sense. Am I framing the problem in a way that taps into an idiomatic Haskell train of thought and approach?

Job done and…

The subsequent fixes involved making sure I included the right type constraints.

  1. request :: MonadResource m

    • Ran this in a nested do block that temporarily augmented the stack using runInternalState on the request
  2. setDbWithS3 :: (MonadCatch m, MonadLogger m, MonadReader Env m, MonadIO m

    • Downgraded the constraint from MonadResource to MonadIO
    • (this one was made it all come together given where I was)
  3. Kept the custom monad AppObs as it was in the very beginning (and thus used the same natural transformation)

In hindsight, and after having listened to “an oldie by a goody” from Edward Kmett on type classes, it finally started to sink-in: “getting it to type”, is about linking the type to the type class code for that class. Concretely, vtables indexed by type. Obvious to many, but grounding for me.

2 Likes