Scala: Should I use a method or a function?

Posted on June 1, 2016

This question is often raised on forums, and there is a great deal of confusion in the context of the functional programming what’s the difference is and what should be used where. Scala has two function-like entities:

class Test {

  # Method
  def m(x: Int) = x + 1

  # Function
  val f = (x: Int) => x + 1
}

Both of them can be used in a functional context:

class Test {

  # Method
  def m(x: Int) = x + 1

  # Function
  val f = (x: Int) => x + 1

  # Using function
  def x = println(List(1, 2, 3).map(f))

  # Using method
  def y = println(List(1, 2, 3).map(m))
}

Similar, isn’t it? Let’s look closer.

m here is a method, the very same class method we are familiar with in Java. It’s not an object, can’t be passed anywhere and has no value. In fact it belongs to the class Test and only makes sense when calling it as testInstance.m(...). And indeed, decompiled bytecode shows it as public int m(int); in class Test.

But we pass it to map, don’t we? First, let’s comment out def y and look at class files:

Test$$anonfun$1.class  Test.classs

Nothing too interesting: our Test class and anonymous function f in Test$$anonfun$1. Now let’s return def y back:

Test$$anonfun$1.class  Test$$anonfun$y$1.class  Test.class

Wait, what is Test$$anonfun$y$1? When Scala sees a method in a functional context it defines anonymous function class (a class derived from Function1 in this particular example), which has apply method, and that apply method simply calls testInstance.m(...) (testInstance is passed as an argument to anonymous function constructor so it can be used later). Almost the same thing happens with f, an anonimous function class is defined with the corresponding apply method. There is one big difference though if we look at bytecode for x and y:

x:

  24: invokevirtual #37   // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
  27: invokevirtual #41   // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala ...
  30: aload_0
  31: invokevirtual #43   // Method f:()Lscala/Function1;
  34: getstatic     #33   // Field scala/collection/immutable/List$.MODULE$:Lscala/collection/immutable/List$;
  37: invokevirtual #47   // Method scala/collection/immutable/List$.canBuildFrom:()Lscala/collection/generic...
  40: invokevirtual #53   // Method scala/collection/immutable/List.map:(Lscala/Function1;Lscala/collection...
  43: invokevirtual #57   // Method scala/Predef$.println:(Ljava/lang/Object;)V
  46: return

y:

  24: invokevirtual #36   // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
  27: invokevirtual #40   // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)Lscala ...
  30: new           #59   // class Test$$anonfun$y$1
  33: dup
  34: aload_0
  35: invokespecial #63   // Method Test$$anonfun$y$1."<init>":(LTest;)V
  38: getstatic     #32   // Field scala/collection/immutable/List$.MODULE$:Lscala/collection/immutable/List$;
  41: invokevirtual #46   // Method scala/collection/immutable/List$.canBuildFrom:()Lscala/collection/generic ...
  44: invokevirtual #52   // Method scala/collection/immutable/List.map:(Lscala/Function1;Lscala/collection ...
  47: invokevirtual #56   // Method scala/Predef$.println:(Ljava/lang/Object;)V

Every time I call y a new instance of class Test$$anonfun$y$1 is created and initialized! While in x nothing like this happens as anonymous function was instantiated when I declared f, so the code simply calls this.f to retrieve pre-initialized value.

Given this my suggestion would be to use explicit functions whenever higher-order functions/methods are involved and not rely on implicit conversions, and use methods in all other cases.

Update: Scala 2.12-M4 and Java 8 lambdas

Does it “no longer matter” with introduction of Scala 2.12 which compiles to Java 8 lambdas? Let’s look at Java 8 bytecode. The good part is that Scala 2.12 doesn’t create class definitions for anonymous functions, instead it’s using factories to create dynamic classes on the fly. At least JAR becomes smaller and loading potentially faster. Instead Scala creates methods in class Test which are called from a Function1 object:

  // f
  public static final int Test$$$anonfun$1(int);

    0: iload_0
    1: iconst_1
    2: iadd
    3: ireturn

  // wrapper for m
  public final int Test$$$anonfun$2(int);

    0: aload_0
    1: iload_1
    2: invokevirtual #86  // Method m:(I)I
    5: ireturn

Compiler is smart enough to optimize out the instance of Test in the first case (i.e. the method is static) which saves it from passing an object reference in this particular case. Unfortunately in the second case this is required since we are going to call m. Let’s look how these functions are called:

  public Test();

   0: aload_0
   1: invokespecial #89     // Method java/lang/Object."<init>":()V
   4: aload_0
   5: invokedynamic #95,  0 // InvokeDynamic #1:apply$mcII$sp:()Lscala/runtime/java8/JFunction1$mcII$sp;
  10: putfield      #25     // Field f:Lscala/Function1;
  13: return

  public void x();

  ...
  24: invokevirtual #41   // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
  27: invokevirtual #45   // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)...
  30: aload_0
  31: invokevirtual #47   // Method f:()Lscala/Function1;
  34: getstatic     #37   // Field scala/collection/immutable/List$.MODULE$:Lscala/collection/...

vs

  public void y();

  24: invokevirtual #41     // Method scala/Predef$.wrapIntArray:([I)Lscala/collection/mutable/WrappedArray;
  27: invokevirtual #45     // Method scala/collection/immutable/List$.apply:(Lscala/collection/Seq;)...
  30: aload_0
  31: invokedynamic #83,  0 // InvokeDynamic #0:apply$mcII$sp:(LTest;)Lscala/runtime/java8/JFunction1$mcII$sp;
  36: getstatic     #37     // Field scala/collection/immutable/List$.MODULE$:Lscala/collection/immutable/List$;

First of all, what is this invokedynamic call? JVM is using a factory to create a lightweight object with apply method. Factory arguments look like this (for f):

      #70 (I)I
      #92 invokestatic Test.Test$$$anonfun$1:(I)I
      #70 (I)I
      #75 3
      #76 1
      #78 scala/Serializable
      #79 0

Basically it tells JVM how to initialize an object. But invokedymanic is not just initializing the object, it actually creates a code snippet based on the factory data and replaces invokedynamic with it so it doesn’t need to touch the factory again. Well, it’s still not an object but inline code, so every type invokedynamic is called it will initialize a new lightweigth object with apply method. So the mechanism is the same as in 2.11, only lambdas are much lighter and don’t require full classes.

What about those cycles? The difference is still there. x is simply using f initialized when constructing Test object. y is initializing lightweight anonymous function every time y is called, and also wrapping a call to m in another method, namely Test$$$anonfun$2(int).

Does it all really matter? Well, I wouldn’t be eager to replace methods used in higher-order method/functions in existing code (unless it’s in performance-crytical code like Spark, but even then benchmarks should probably be done first). But whenever I have a choice between two options and there is no other reason except for historical I would use functions.

“Entities must not be multiplied beyond necessity” (c) Occam

Update: Performance

Let’s do some simple benchmarking (2.12.0-M4):

class Test {
  val f = (x: Int) => x + 1
  def m(x: Int) = x + 1

  def x = (1 to 1000).map(f)
  def y = (1 to 1000).map(m)
}


object Test {

  val test = new Test
  val samples = 10000
  def sqr(x: Double) = x * x

  def main(args: Array[String]) = {

    val results = Array.fill[Long](samples)(0)
    var i = 0 
    while(i < samples) {
      var i2 = 0
      val t0 = System.nanoTime()
      while(i2 < 1000) {
        test.x // test.y
        i2 += 1
      }
      results(i) = System.nanoTime() - t0
      i += 1
    } 
    println(s"min=${results.min}")
  }
}

Minimum will probably take into account warm-up and occasional GC. Let’s even run it 10 times and take minimum again. Here’s the result:

x: 6669954
y: 6962329 (+4.4%)

Not much and it’s a special case, but it’s noticeable.