Logging is something we all need to do. There are a many logging frameworks available with various layers of abstraction (logback, slf4j, scalalogging etc). These libraries are all side effecting. As good functional programmers we want defer side effects and push them to the boundaries of the system.
The problem with logging
So what exactly is the problem with logging?
Well the textbook answer to this question is that it’s not Referentially transparent with all the associated issues that stem from this. We can defer our log statements by wrapping everything in a Task or IO:
def getUser(): User = {
logger.info("getting user")
}
// becomes
def getUser(): Task[User] = Task {
logger.info("getting user")
}
We have achieved referential transparency. Job done? Well not quite. Let’s take a real world example of a multi user system. Requests are coming in all the time and we want some way to correlate log entries for the same request. The traditional Java style approach would be to use ThreadLocal to pass a correlation id along with the request and include this in every log statement, perhaps using MDC.
This doesn’t work so well in the Scala world as we almost certainly cross async boundaries during an operation. The solution? pass an implicit correlation id along the call stack:
def getUser()(implicit correlationId: String): Task[User] = Task {
MDC.put("correlationId", correlationId)
logger.info("getting user")
MDC.clear()
}
This works but it’s really messy, we need to pass our correlation id around everywhere, along with all the other implicit “stuff”
A better way
We can turn this problem on it’s head. Instead of passing the correlation id along our call stack, we instead return something that also includes logs. Our psudocode would look something like:
def getUser: Task[(Logs, User)]
At “the end of the world” we then write all log entries with our correlation id. Assuming some sort of web server it may look like:
for {
correlationId <- generateCorrelationId // Task[String]
userAndLogs <- getUser // Task[(Logs, User)]
(logs, user) = userAndLogs
_ <- Task {
MDC.put("correlationId", correlationId)
logs.foreach(logMessage => logger.info(logMessage))
MDC.clear()
}
} yield user
Of course when using this approach we would not simply log everything at info level as above. We would create our own ADT with a log level along with a log message. We would then pattern match on the level to decide which level to log at.
Enter the Writer monad
Cats includes a data type specifically for this purpose. It’s called the Writer monad or WriterT. Technically it’s actually a monad transformer, the Writer monad is just a WriterT parameterised using an Id for the effect. Let’s see how we would use it
package uk.co.tobyhobson.cats
import cats.data.WriterT
import cats.instances.vector._
import monix.eval.Task
import scala.concurrent.duration._
import scala.concurrent.Await
object WriterMonad {
case class User(id: Int)
case class Order(totalAmount: Int)
def getUser(id: Int): WriterT[Task, Vector[String], User] = {
// The apply method is a simple way of constructing a writer with a log and result, suspended in F
WriterT(Task { Vector("getting user") -> User(id) })
}
def getOrder(user: User): WriterT[Task, Vector[String], Order] = {
WriterT(Task { Vector("getting order") -> Order(100) })
}
def main(args: Array[String]): Unit = {
import monix.execution.Scheduler.Implicits.global
val program = for {
user <- getUser(1)
order <- getOrder(user)
} yield order
val eventualLogs = program.written.runToFuture
val logs: Vector[String] = Await.result(eventualLogs, 1.second)
val eventualResult = program.value.runToFuture
val result = Await.result(eventualResult, 1.second)
println(logs)
println(result)
}
}
The most relevant bit here is the getUser
and getOrder
methods. There are many ways to construct a Writer but the
principle is the same. We return a Writer with a log and result, then sequence the operations using flatMap.
It’s interesting to see how we return a Vector of Strings in getUser
and another Vector in getOrder
. Somehow our
Vectors get merged together to give us a final Vector[String]
. WriterT
actually requires a Semigroup
instance for
our chosen “Log” to be in scope. This explains the import cats.instances.vector._
import at the top of the file
(cats includes a Semigroup implementation for Vector). It’s this semigroup that allows the Writer to concatenate the
various logs together.
Summary
The Writer monad can be a good solution to the tricky problem of correlating log entries togther in a multi user, multi-threaded environment. There are a few drawbacks to be aware of though:
- Logs are not written in realtime but “at the end of the world”. So during development it can be disconcerting to see no log entires followed by a sudden burst
- If you rely on the logging timestamps to time operations (you really shouldn’t) this won’t work because of point 1
- If for some reason the underlying “F” fails, the result is lost and so are the logs!
The last point is a big one but as good functional developers we really shouldn’t be dealing with checked or unchecked exceptions. A proper error handling model of sealed traits or Shapeless Coproducts would be a much better solution.