scala

3/n - How do I in FP... trying several input for the same function

Emmanuel Nhan

27 Dec 2019 • 6 min read

In the previous entry of this series, I wrote about error recovery in a functional way. But sometimes, we want to try several inputs until we retrieve (optionally) a result. This is the case if you have a function which compute a location from a string (think geo-search), and which optionally returns a result, if a place matched. We may try this function on several input strings and only care in the first result. In this article, we will see how to do this in a functional way.

Problem setup

Given a function def computeLocation(str: String): Option[Location] which tries to infer a location from a string and a list of input strings sorted by order of relevance (meaning that the first element may result in a more interesting location than the second and so on), let's retrieve the location (optionally). To sum it up:

// The computeLocation function:
def computeLocation(s: String): Option[Location] = // the implementation

// this call may generate a long List
val inputs: List[String] = createInputsFromForm(aForm)

How implement val result: Option[Location] = ??? ?

From scratch

The first solution is to call the computeLocation function for every string in the input list:

inputs.flatMap(computeLocation).headOption

This solution is the most simple ever. Let's break it into pieces to analyze it.
Let's start from the end. As we care only about the first meaningful result (which may not exist), we call headOption.
Now, back to the beginning of the line, we are calling flatMap on a List. The proper signature of flatMap in Scala standard library is slightly different from the one we are used to:

def flatMap[B](f: A => IterableOnce[B]): List[B] = //...

In the standard library, flatMap expects a function producing an IterableOnce. Fortunately Option implements this trait. The definition commonly used for flatMap is more like this one:

// Given that F is a type with one parameter
def flatMap[A, B](fa: F[A])(f: A => F[B]): F[B] 

// For List this results in:
def flatMap[A, B](fa: List[A])(f: A => List[B]): List[B]

In the standard library, the trick with IterableOnce spares us from writing something like:

inputs
    .flatMap{ s => 
        computeLocation(s).map(l => List(l)).getOrElse(Nil)
    }
    .headOption

In this code, we are manually transforming an Option into a List.

What's wrong with this implementation ? If you provide a dummy implementation for both computeLocation and createInputsFromForm, you will see that computeLocation is called for every single item in the list. If we consider that computing the location is a costly operation, it is sad to waste resource computing, while we only care about the first meaningful answer. How to make the flatMap sequence stop when we hit a result or the end of the list ?

Recursion by hand

If we break our problem in terms of imperative programming, we will easily come up with a while statement (in pseudo code):

while no result and list has elements left 
  result = computeLocation with next element in the list
done

Let's encode this with recursion:

@scala.annotation.tailrec
def recur(next: List[String]): Option[Location] = {
  next match {
    case Nil => None
    case head :: tail =>
      val r = computeLocation(head)
      if (r.isDefined) r else recur(tail)
  }
}
recur(inputs)

What we do is just manually traverse the list recursively until computeLocation gives a meaningful result. This is strictly the same as the imperative while version, but in FP, we like to look further. We could for instance abstract over the computeLocation and take as an extra parameter the function returning an Option:

@scala.annotation.tailrec
def recur(next: List[String],
          f: String => Option[Location]): Option[Location] = {
  next match {
    case Nil => None
    case head :: tail =>
      val r = f(head)
      if (r.isDefined) r else recur(tail, f)
  }
}
recur(inputs, computeLocation)

Looking at this code, do we really have to deal with String and Location ? We do not use any of the properties of these types in recur. Let's make them parameters.

@scala.annotation.tailrec
def recur[A, B](next: List[A], f: A => Option[B]): Option[B] = {
  next match {
    case Nil => None
    case head :: tail =>
      val r = f(head)
      if (r.isDefined) r else recur(tail, f)
  }
}
recur(inputs, computeLocation)

Following this reasoning we could go even further and add some mechanism to also deal with something else than Option and List. In fact, cats has such functions to hide recursion from you. It is called tailRecM but as it deserves an entire post for itself, I won't talk about it here.

Back with the Scala standard library...

In the standard library, we can also find the collectFirst function which, well do pretty much what we need !

inputs.collectFirst { str =>
  computeLocation(str) match {
    case Some(r) => r
  }
}

Well, the careful reader would have noticed that the function we pass to collectFirst is not total: it does not produce an output for every input.

As computeLocation is total, then it is a bit sad to write a non-total version of it... The standard library also provides a function to transform a total function returning Option to a partial one: def unlift[T, R](f: T => Option[R]): PartialFunction[T, R]. So we can now write:

inputs.collectFirst(Function.unlift(computeLocation))

If you are using cats, you will find collectFirstSome which will make this possible:

inputs.collectFistSome(computeLocation)

What if `computeLocation` perform other effects ?

Let's change the problem a bit now. What if the computeLocation function performs some network calls, or read a database to find a result ? These things are known to fail at one point or another. How to deal with that ?

Without adding too much complexity, let's consider that computeLocation has a more expressive output: Either[Throwable, Option[Location]]. This type means that computeLocation may fail with a Throwable or give a result if it was able to compute one (the absence of result being completely normal). The complete signature is now:

def computeLocation(rawStr: String): Either[Throwable, Option[Location]]

As you can see, the Option result is now boxed into Either. This makes both collectFirst and collectFirstSome impossible to use, as the type do not match now. In FP, this kind of boxing appears frequently, so that authors of functional libraries like cats already wrote functions to deal with this. In cats, you will often see functions whose name ends with a capital M. In our case, it is easy to find collectFirstSomeM which has this signature:

def collectFirstSomeM[G[_], B](f: A => G[Option[B]])
        (implicit F: Foldable[F], G: Monad[G]): G[Option[B]]

As you can see, this definition is abstract. Note that this function is part of a class parametrized on F, where F is itself a type with a "hole" : F[_]. In our example, this F will be Either[Throwable,_] (ie Either with only one type parameter applied). To make it easier, here is a version (pseudo scala) where abstract types are replaced with the one we use in our example (without the implicit part):

def collectFirstSomeM[Either[Throwable,_], Location](f: String => IO[Option[Location]]): Either[Throwable,Option[Location]]

Look how computeLocation has exactly the expected signature for the f parameter ! What about the implicits parameters now ? In this case, we can read the definition as: "the collectFirstSomeM method is available if and only if for the F type we can implement the trait Foldable and for the G type the Monad trait". In our case, with List and Either it is already done for us in cats so we can do:

import cats.implicits._
val inputs: List[String] = computeInputs(param)
val result: Either[Throwable, Option[Location]] = inputs.collectFirstSomeM(computeLocation)

Yet, if you write a simple sample for this, like :


def computeLocation(rawStr: String): Either[Throwable, Option[Location]] = {
    println(s"Computing with $rawStr") // to see something
    // Fake implementation
    if (rawStr.length == 1) Left(new IllegalStateException("wrong state"))
    else if (rawStr.length % 4 == 0) Right(Some(Location(2.0, 4.0, "Plop")))
    else Right(None)
}

val inputs = List("a", "impair?", "plop", "other")

import cats.implicits._
println(
  inputs.collectFirstSomeM(computeLocation)
)

You will notice that the output will be:

Computing with a
Left(java.lang.IllegalStateException: wrong state)

The execution is stopping as soon as a Left is returned ! This is perfectly normal as Either is a fail fast structure. And the requirement for Monad should have caught our attention: Monad express sequence, the next operation using the result of the previous one as input parameter.
So, as the caller of computeLocation in our situation, we have two options:

consider a Left as a blocker, and indeed, keep it,
try to recover from it by composing computeLocation with a function handling the situation with care.

Here is an example of the second choice, being ignoring errors (you probably don't want to do that, and perform a selective recovery, composing with things we saw earlier):

inputs.collectFirstSomeM(
  s => computeLocation(s).recoverWith(_ => Either.right(None))
)

Conclusion

In this article, we saw how to deal with something simple: calling a function for elements in a list and keeping the first meaningful result (if any). Even if it was simple, what I would like to highlight is how this solution is composable. Every type in the design models an intent:

Option models that the result may be irrelevant or present
Either models that a computation may fail
Foldable express that we want to build a single value from an initial structure (in our case, we fold the list to a single result)
Monad express that we compute in sequence: indeed, there is no parallelism involved in our solution, we try the input strings one after the other.

Yet, the work here is not over. Next time, we will see how we can adapt this example to make it less "sequential".