Toby Hobson

Writer Monad

banner.png
Writer Monad
Estimated reading time: 4 minutes

Logging is something we all need to do. There are a many logging frameworks available with various layers of abstraction (logback, slf4j, scalalogging etc). These libraries are all side effecting. As good functional programmers we want defer side effects and push them to the boundaries of the system.

You can find complete examples of the concepts discussed on my blog in my Github repo . In particular, check out the Writer monad example

The problem with logging

So what exactly is the problem with logging?

Well the textbook answer to this question is that it’s not Referentially transparent with all the associated issues that stem from this. We can defer our log statements by wrapping everything in a Task or IO:

def getUser(): User = {
  logger.info("getting user")
}

// becomes
def getUser(): Task[User] = Task {
  logger.info("getting user")
}

We have achieved referential transparency. Job done? Well not quite. Let’s take a real world example of a multi user system. Requests are coming in all the time and we want some way to correlate log entries for the same request. The traditional Java style approach would be to use ThreadLocal to pass a correlation id along with the request and include this in every log statement, perhaps using MDC.

This doesn’t work so well in the Scala world as we almost certainly cross async boundaries during an operation. The solution? pass an implicit correlation id along the call stack:

def getUser()(implicit correlationId: String): Task[User] = Task {
  MDC.put("correlationId", correlationId)
  logger.info("getting user")
  MDC.clear()
}

This works but it’s really messy, we need to pass our correlation id around everywhere, along with all the other implicit “stuff”

A better way

We can turn this problem on it’s head. Instead of passing the correlation id along our call stack, we instead return something that also includes logs. Our psudocode would look something like:

def getUser: Task[(Logs, User)]

At “the end of the world” we then write all log entries with our correlation id. Assuming some sort of web server it may look like:

for {
  correlationId <- generateCorrelationId // Task[String]
  userAndLogs <- getUser // Task[(Logs, User)]
  (logs, user) = userAndLogs
  _ <- Task {
    MDC.put("correlationId", correlationId)
    logs.foreach(logMessage => logger.info(logMessage))
    MDC.clear()
  }
} yield user

Of course when using this approach we would not simply log everything at info level as above. We would create our own ADT with a log level along with a log message. We would then pattern match on the level to decide which level to log at.

Enter the Writer monad

Cats includes a data type specifically for this purpose. It’s called the Writer monad or WriterT. Technically it’s actually a monad transformer, the Writer monad is just a WriterT parameterised using an Id for the effect. Let’s see how we would use it

package uk.co.tobyhobson.cats

import cats.data.WriterT
import cats.instances.vector._
import monix.eval.Task
import scala.concurrent.duration._

import scala.concurrent.Await

object WriterMonad {
  
  case class User(id: Int)
  case class Order(totalAmount: Int)
  
  def getUser(id: Int): WriterT[Task, Vector[String], User] = {
    // The apply method is a simple way of constructing a writer with a log and result, suspended in F
    WriterT(Task { Vector("getting user") -> User(id) })
  }
  
  def getOrder(user: User): WriterT[Task, Vector[String], Order] = {
    WriterT(Task { Vector("getting order") -> Order(100) })
  }
  
  def main(args: Array[String]): Unit = {
    import monix.execution.Scheduler.Implicits.global
    
    val program = for {
      user <- getUser(1)
      order <- getOrder(user)
    } yield order
    
    val eventualLogs = program.written.runToFuture
    val logs: Vector[String] = Await.result(eventualLogs, 1.second)
    
    val eventualResult = program.value.runToFuture
    val result = Await.result(eventualResult, 1.second)
   
    println(logs)
    println(result)
  } 
}

The most relevant bit here is the getUser and getOrder methods. There are many ways to construct a Writer but the principle is the same. We return a Writer with a log and result, then sequence the operations using flatMap.

It’s interesting to see how we return a Vector of Strings in getUser and another Vector in getOrder. Somehow our Vectors get merged together to give us a final Vector[String]. WriterT actually requires a Semigroup instance for our chosen “Log” to be in scope. This explains the import cats.instances.vector._ import at the top of the file (cats includes a Semigroup implementation for Vector). It’s this semigroup that allows the Writer to concatenate the various logs together.

Summary

The Writer monad can be a good solution to the tricky problem of correlating log entries togther in a multi user, multi-threaded environment. There are a few drawbacks to be aware of though:

  1. Logs are not written in realtime but “at the end of the world”. So during development it can be disconcerting to see no log entires followed by a sudden burst
  2. If you rely on the logging timestamps to time operations (you really shouldn’t) this won’t work because of point 1
  3. If for some reason the underlying “F” fails, the result is lost and so are the logs!

The last point is a big one but as good functional developers we really shouldn’t be dealing with checked or unchecked exceptions. A proper error handling model of sealed traits or Shapeless Coproducts would be a much better solution.

comments powered by Disqus

Need help with your project?

Do you need some help or guidance with your project? Reach out to me (email is best)