The Rest Of The Story

In which I wrap up some Scala drama.

29 Nov 2011

About a week ago, I mentioned in passing to an old co-worker on Twitter that Yammer was moving some of our stack from Scala to Java. A few days later, Donald Fischer (the CEO of Typesafe) emailed me at my personal account, asking for more details. He CC’ed Martin Odersky, the lead designer of Scala and Typesafe’s Chief Architect. Given that the two people best-situated to improve Scala had just asked me about my experience over the past two years of using Scala, I wrote a long, considered, brutally honest response:


Hi Donald (and Martin),

Thanks for pinging me; it’s nice to know Typesafe is keeping tabs on this, and I appreciate the tone. This is a Yegge-long response, but given that you and Martin are the two people best-situated to do anything about this, I’d rather err on the side of giving you too much to think about. I realize I’m being very critical of something in which you’ve invested a great deal (both financially and professionally) and I want to be explicit about my intentions: I think the world could benefit from a better Scala, and I’d like to see that work out even if it doesn’t change what we’re doing here.

Right now at Yammer we’re moving our basic infrastructure stack over to Java, and keeping Scala support around in the form of façades and legacy libraries. It’s not a hurried process and we’re just starting out on it, but it’s been a long time coming. The essence of it is that the friction and complexity that comes with using Scala instead of Java isn’t offset by enough productivity benefit or reduction of maintenance burden for it to make sense as our default language. We’ll still have Scala in production, probably in perpetuity, but going forward our main development target will be Java.

So.

Scala, as a language, has some profoundly interesting ideas in it. That’s one of the things which attracted me to it in the first place. But it’s also a very complex language. The number of concepts I had to explain to new members of our team for even the simplest usage of a collection was surprising: implicit parameters, builder typeclasses, “operator overloading”, return type inference, etc. etc. Then the particulars: what’s a Traversable vs. a TraversableOnce? GenTraversable? Iterable? IterableLike? Should they be choosing the most general type for parameters, and if so what was that? What was a =:= and where could they get one from?

A lot of this has been waved away as something only library authors really need to know about, but when an library’s API bubbles all of this up to the top (and since most of these features resolve specifics at the call site, they do), engineers need to have an accurate mental model of how these libraries work or they shift into cargo-culting snippets of code as magic talismans of functionality.

In addition to the concepts and specific implementations that Scala introduces, there is also a cultural layer of what it means to write idiomatic Scala. The most vocal — and thus most visible — members of the Scala community at large seem to tend either towards the comic buffoonery of attempting to compile their Haskell using scalac or towards vigorously and enthusiastically reinventing the wheel as a way of exercising concepts they’d been struggling with or curious about. As my team navigated these waters, they would occasionally ask things like: “So this one guy says the only way to do this is with a bijective map on a semi-algebra, whatever the hell that is, and this other guy says to use a library which doesn’t have docs and didn’t exist until last week and that he wrote. The first guy and the second guy seem to hate each other. What’s the Scala way of sending an HTTP request to a server?” We had some patchwork code where idioms which had been heartily recommended and then hotly criticized on Stack Overflow threads were tried out, but at some point a best practice emerged: ignore the community entirely.

Not being able to rely on a strong community presence meant we had to fend for ourselves in figuring out what “good” Scala was. In hindsight, I definitely underestimated both the difficulty and importance of learning (and teaching) Scala. Because it’s effectively impossible to hire people with prior Scala experience (of the hundreds of people we’ve interviewed perhaps three had Scala experience, of those three we hired one), this matters much more than it might otherwise. If we take even the strongest of JVM engineers and rush them into writing Scala, we increase our maintenance burden with their funky code; if we invest heavily in teaching new hires Scala they won’t be writing production code for a while, increasing our time-to-market. Contrast this with the default for the JVM ecosystem: if new hires write Java, they’re productive as soon as we can get them a keyboard.

Even once our team members got up to speed on Scala, the development story was never as easy as I’d thought it would be. Because one never writes pure Scala in an industrial setting, we found ourselves having to superimpose four different levels of mental model — the Scala we wrote, the Java we didn’t write, the bytecode it all compiles into, and the actual problem we were writing code to solve. It wasn’t until I wrote some pure Java that I realized how much extra burden that had been, and I’ve heard similar comments from other team members. Even with services that only used Scala libraries, the choice was never between Java and Scala; it was between Java and Scala-and-Java.

Adding to the unease in development were issues with the build toolchain. We started with SBT 0.7, which offered a pleasant interface to some rather dubious internals, but by the time SBT 0.10 came out, we’d had endless issues trying to debug or extend SBT. We looked at using 0.10, but we found it to have the exact same problems managing dependencies (read: Ivy), two new, different flavors of inpenetrable, undocumented, symbol-heavy API, and an implementation which can only be described as an idioglossia. The fact that SBT plugin authors had to discover what “best practices” are in order to avoid making two plugins accidentally incompatible should have been a red flag for any tool which includes typesafety as a selling point. (The fact that I tried to write a plugin to replace SBT’s usage of Ivy with Maven’s Aether library should have been a red flag for me.) We ended up moving to Maven, which isn’t pretty but works. We jettisoned all of the SBT plugins I wrote to duplicate Maven functionality, our IDE integration worked properly, and the rest of our release toolchain (CI, deployment, etc.) no longer needed custom shims to work. But using Maven really highlighted the second-class status assigned to it in the Scala ecosystem. In addition to the “enterprisey” cat-calls and disbelief from the community, we found out that pointing out scalac’s incremental compilation bugs had gotten that feature removed outright. Even the deprecation warning for -make: suggests using SBT or an IDE. This emphasis on SBT being the one true way has meant the marginalization of Maven and Ant – the two main build tools in the Java ecosystem. Cross-building is also crazy-making. I don’t have any good solutions for backwards compatibility, but each major Scala release being incompatible with the previous one biases Scala developers towards newer libraries and promotes wheel-reinventing in the general ecosystem. Most Scala releases contain improvements in day-to-day programming (including compilation speed), but an application developer has to wait until all their dependencies are upgraded before they themselves can upgrade. If they can’t wait, they have to take on the maintenance burden of that library indefinitely. In order to reduce their maintenance overhead, they naturally look for another, roughly equivalent library with a more responsive author. Even if the older library is better- tested, better-documented, and better-featured it will still lose out over time as developers jump ship for something that works with Scala 2.next sooner. (It’s also worth noting that most companies using Scala at scale or in mission- critical capacities will not immediately upgrade; the library authors they employ will likely be similarly conservative, and the benefit their experience brings to their code will benefit the community less and less over time. As far as I’ve found, we’re the only big startup in SF using 2.9.) Once in production, Scala’s runtime characteristics were the least subtle problem. At one point, half the team was working on a distributed database, and given the write fanout for our large networks some parts of the code could be called 10-20M times per write. Via profiling and examining the bytecode we managed to get a 100x improvement by adopting some simple rules:

  1. Don’t ever use a for-loop. Creating a new object for the loop closure, passing it to the iterable, etc., ends up being a forest of invokevirtual calls, even for the simple case of iterating over an array. Writing the same code as a while-loop or tail recursive call brings it back to simple field access and gotos. While I’m sure Scala will be have better optimizations in the future, we had to mutilate a fair portion of our code in order to actually ship it. (In another service, we got away with just using the ScalaCL compiler plugin and copying things to and from arrays instead of using immutable collections.)

  2. Don’t ever use scala.collection.mutable. Replacing a scala.collection.mutable.HashMap with a java.util.HashMap in a wrapper produced an order-of-magnitude performance benefit for one of these loops. Again, this led to some heinous code as any of its methods which took a Builder or CanBuildFrom would immediately land us with a mutable.HashMap. (We ended up using explicit external iterators and a while-loop, too.)

  3. Don’t ever use scala.collection.immutable. Replacing a scala.collection.immutable.HashMap with a java.util.concurrent.ConcurrentHashMap in a wrapper also produced a large performance benefit for a strictly read-only workload. Replacing a small Set with an array for lookups was another big win, performance-wise.

  4. Always use private[this]. Doing so avoids turning simple field access into an invokevirtual on generated getters and setters. Generally HotSpot would end up inlining these, but inside our inner serialization loop this made a huge difference.

  5. Avoid closures. Ditching Specs2 for my little JUnit wrapper meant that the main test class for one of our projects (~600-700 lines) no longer took three minutes to compile or produced 6MB of .class files. It did this by not capturing everything as closures. At some point, we stopped seeing lambdas as free and started seeing them as syntactic sugar on top of anonymous classes and thus acquired the same distaste for them as we did anonymous classes.

Now, every language has its performance issues, and the best a standard library can hope to do is to hit 80% of use cases. But what we found were pervasive issues — we could replace all of our own usages of s.c.i.HashMap, but it’s a class which is extensively used throughout the standard library. It being slower than j.u.HashMap means groupBy is slower, as is a lot of other collections functionality I like.

At some point, I wondered if the positive aspects of our development experience owed less to Scala and more to the set of libraries we use, so I spent a few days and roughly ported a medium-sized service to pure Java. I broached this issue with the team, demo’d the two codebases, and was actually surprised by the rather immediate consensus on switching. There’s definitely aspects of Scala we’ll miss, but it’s not enough to keep us around.

Already I’ve moved our base web service stack to Java, with Scala support as a separate module. New services are already being written on it, and given the results from our Hack Day at the beginning of this week it hasn’t slowed our ability to quickly ship complex code. I’m keeping a close eye on the effects of this change, but I’m optimistic, and the team seems excited. We’ll see.

So.

I’ve tried hard here not to offer you advice. Some of these problems could easily be specific to our team and our workload; some of them won’t make a difference in how your company does; some of them aren’t even your problems to solve, really. But they’re still the problems we’ve encountered over the past two years, and they compose the bulk of what’s motivating this change.

Despite the fact that we’re moving away from Scala, I still think it’s one of the most interesting, innovative, and exciting languages I’ve used, and I hope this giant wall of opinion helps you in some way to see it succeed. If there’s anything here I can clarify for you, please let me know.


In the process of composing that email, I asked a few personal friends to review a draft of it. One of those friends shared that draft with someone else, who shared it with someone else, etc. etc. and today I woke up to find my email to Donald and Martin splashed across Hacker News and Twitter. I deleted the gist, but the cat was out of the bag, and now I find myself having to publicly explain the context of a private conversation.

I wrote that email for a very specific reason: Donald asked me for my opinion. If someone asks me for an honest opinion of them or their work, in private, I feel morally compelled to be as honest as I can with them. The only way we can ever know how we appear to other people is through other people; the only way we can know what others think of what we build is to ask and listen to them. The fundamental asymmetry of consciousness means triangulation is the only path to understanding. Anyone running a business understands this, at some level: what customers say does not by itself constitute objective reality, but does offer critical insight into it that one is essentially incapable of discovering by oneself.

But it’s simplistic and naive to assume that I wrote what I did in an unguarded moment and that somehow this represents a more truthful account of what I’d say in public. The most that you can possibly know about this is the text of my email to Donald and Martin, not the context.

Yes, that email is not what I would say in public. The Scala community needs another giant blog post about ways in which someone doesn’t like Scala like I need a hole in my head, and I’d rather suck a dog’s nose dry than lend a hand to the nerd slapfights on Hacker News. The world has yet to take me aside and ask me for my opinion of it, and in the past few years I’ve found that it’s far more profitable to build things rather than tilt at windmills.

So.

Should you use Scala? Is Java better?

(You’re asking the wrong questions.)