A couple days ago, we sat down with Heroku general manager Oren Teich to talk about an ongoing PR and customer service problem: allegations by a small startup, Rap Genius, that Heroku had intentionally misled and overcharged its customers by millions of dollars over the past two years.
In that interview, Teich said, “[Heroku's] number one job is to listen to our customers and make them successful. … I think it’s great when someone provides feedback , positive or negative, but we need to do it in a respectful way.
“If you can’t say anything nice, don’t say anything at all.”
Rap Genius co-founder Tom Lehman, who had been in touch with VentureBeat about the allegations the week before, read those words and had a very different response. While he says he still has appreciation for Heroku as a service and what it’s done for his and many other applications, he’s still got a lot to say. And some of it isn’t very nice.
VentureBeat: So, you assert that Heroku’s documentation was not only incomplete but actually intentionally misleading. We have a lot of nerdy but non-technical readers. Can you put that into a metaphor they might be able to understand?
Tom Lehman: Suppose you were buying a “maintenance free” car. The dealer promises to keep the car running well, and in return you can’t open the hood or perform any maintenance yourself (obviously such a car would cost much more than a normal car).
Before you buy the car, you verify from the dealer’s marketing information that its engine has six cylinders. This is important, because a four-cylinder car isn’t powerful enough for your purposes. So you’re relieved to see that the dealer has promised you six cylinders in both the marketing information and in the technical documentation describing exactly how the car works (to help you with the limited maintenance you’re capable of doing on your own).
We shouldn’t have to start a media shitstorm just to get Heroku to admit what’s really going on. –Tom Lehman
However, you soon start to notice performance degration that would normally be associated with a car only having four cylinders. You verify from the documentation that this isn’t possible, but of course you don’t stop there: You look at the monitoring tools the dealer provided you. Fortunately the monitoring tools have a “health meter” for each cylinder, and everything looks fine — all six cylinders are plugging away. Still not satisfied with the performance, you verify it using the “ultra-premium” marketing tool the dealer has sold you for almost the price of a second car. This monitoring tool — coincidentally named New Relic — is really top of the line. Perhaps it costs you more tham $60,000. And it too says all six cylinders are totally fine
So you email the dealer and ask what’s up. He says he has no idea. Maybe, he says, you should try paying him to upgrade other parts of your car to fix the problem (here’s a support ticket where the real-life version of this happened to me). This helps in the short term, but eventually the poor performance returns. After pressing the dealer for months, he finally admits that every car he has ever sold has four cylinders! You express surprise and anger, but the dealers tells you he’s not interested in talking about it.
Then you write a big public exposé, and the dealer publicly apologizes (but doesn’t offer to either upgrade everyone to six cylinders or refund their money).
VentureBeat: What did Rap Genius specifically lose in this situation? Dollars and cents, plus time/effort/users.
Lehman: The problem was that Rap Genius’s performance was much worse than we thought it was based on the (very expensive) tools Heroku gave us to monitor it. We thought our performance looked like this:
I.e., average 250ms response time with about 150ms spent in Ruby, 50ms spent doing database queries, 50ms spent talking to memcache, and less than 10ms spent waiting in the request queue. When in fact our performance looked like this (this new graph came from the “fixed new relic” gem that New Relic released after we published our original article):
I.e., he same time spent in Ruby, database, memcache, but now 1100ms spent in queuing! And in fact, it’s much worse than this because the queue time has huge variance. Many requests spend more than 20 seconds waiting in the queue. Big difference, right? This caused us the following problems:
- Our site was much slower than we thought it was. This pissed off our users.
- We spent a bunch of time searching for the cause of the slowness but couldn’t find it because Heroku’s tools didn’t report it. Instead, we spent a bunch of time optimizing what we thought was the bottleneck (time spent in Ruby) even though that time was irrelevant in comparison to the time spent in request queuing.
- It turns out that for a big single-threaded Rails app like Rap Genius (and every other Rails app in the default Heroku setup) the only way to optimize the request queue time is to leave Heroku. It would have been much easier to leave Heroku had we known about the problem years ago. Now we’ve invested more in the platform and we’re locked in
- We spent over $60,000 on the software that produced those graphs. That money was wasted because the software gave us bad data which lead to bad decisions
VentureBeat: What other companies or developers have you spoken to who experienced the same issues?
Lehman: A bunch of my YC classmates, a bunch of people in the Hacker News comment threads on our articles (Heroku’s Ugly Secret and Money Trees).
I’d prefer not to pull them into this because (a) it’s a big distraction to publicly face off against Heroku like this, and I think they have better things to do; and (b) it’s not necessary because this problem is crippling for literally every big, single-threaded Rails app that runs on Heroku. So how about this: Heroku, name a single big, single-threaded Rails app that runs on your platform that doesn’t experience horrible request queuing.
Yes, one solution is to run a concurrent web server like Unicorn, but this is very difficult on Heroku since concurrent servers use more memory and Heroku’s dynos only have 512mb of ram, which is low for even processing one request simultaneously.
VentureBeat: Overall, do you think Heroku is a generally good company for developers to work with or a generally bad company for developers to work with?
Lehman: Warts and all, Heroku’s product still works quite well for smaller Rails apps (and those with small enough memory footprints to run Unicorn). How well a platform that only works for customers who aren’t paying a lot of money can do for itself long-term is of course a big question
My issue is not with the Heroku product itself (which, again, is still one of a kind in the world), but with how Heroku represented its product to its customers
VentureBeat: What should Heroku have done differently? What should they be doing now?
Lehman: When Tim Watson first reported the problem in 2011 on Heroku’s mailing list, Adam Wiggins (Heroku’s CTO) and Oren Teich (Heroku’s GM) both admitted that they were over-representing the capabilities of their product. That was the time for a public admission and apology. I would have loved to find out about this issue in 2011 instead of two years later!
When I sent the first draft of our original article to Adam, that too would have been a good opportunity for a public admission and an apology. Instead Adam told me he was stepping out of the conversation and that I should “optimize [my] web stack” (click to read our whole conversation). Of course, Heroku’s tune changed wildly when our release of the article triggered a huge public outcry, but we shouldn’t have to start a media shitstorm just to get Heroku to admit what’s really going on.
The crazy thing is, I’ve still got mad love for Heroku! –Tom Lehman
Finally, now that Heroku has admitted the problem, they should offer their customers a refund. I’m curious to hear Heroku’s thoughts on the subject, but in my opinion they simply must refund the millions of dollars their customers spent on New Relic (Rap Genius spent more than $60,000). They knowingly sold a monitoring tool that provided bad data which lead to bad decisions and charged a ton of money for it. Not cool!
VentureBeat: When you read Oren Teich’s comments on the situation, what was your initial response?
Lehman: I was puzzled. Here we are making some pretty serious allegations against his company — that they know about the problem and kept it secret for over two years and that they now owe their customers millions in refunds — and, instead of mounting a vigorous defense, he told you about the time he typed the word “respect” into his iPhone.
I appreciate that Oren has a notes document with the word “respect” (and “significant” and “belonging”, etc.) written in it, but I think Heroku’s customers deserve a response on the substantive issues as well.
VentureBeat: Would you consider working with Heroku’s technology in the future? If not, why? If so, under what circumstances or with what provisions? </strong.
Lehman: The crazy thing is, I’ve still got mad love for Heroku! Without Heroku, Rap Genius would not be where it is today, no doubt. Moreover, they’re doing stuff no one else is doing, so I’m down with there being issues, mistakes, struggles, whatever. All I ask is that they be real with their customers! And though I appreciate some of what they’ve been saying, they’re still not being truly, really real.
(Also they need to release dynos with more memory so that we can run Unicorn!)
VentureBeat: In your opinion, what is the real heart of the matter here? Why is this issue important?
Lehman: If a company leads you to believe you’re buying X when in fact you’re buying something that’s 50 times worse than X, that’s bad!
(If you’re curious where the 50x number comes from, check out the simulation in our original article.)
VentureBeat: Do you think other companies with developer products are engaging in the same behavior as Heroku?
Lehman: Some companies probably exaggerate the capabilities of their hosting products in their marketing material, though I doubt any who do it are as big or well-regarded as Heroku. And I doubt the exaggeration is anywhere near the 50x that Heroku was exaggerating.
But the really weird thing here is how misleading the tools Heroku provided were. I would be shocked if I found that any company was charging $8,000 per month for a performance monitoring tool that was as misleading as New Relic was in Heroku’s case. But at least they fixed New Relic after our original article! Their logs are STILL incorrect. Here’s a sample line:
2013-03-02T15:41:24+00:00 heroku[router]: at=info method=GET path=/Asap-rocky-pretty-flacko-lyrics host=rapgenius.comfwd="220.127.116.11" dyno=web.234 queue=0 wait=0ms connect=3ms service=366ms status=200 bytes=25582
wait parameters will always read 0, even if the actual value is 20000ms. And this has been the case for years.
VentureBeat: Is the only way to really understand your infrastructure and costs to run your own machines/platform/etc. and accept responsibility for the whole stack? Are errors like this kind of a given when you turn over management of your technology to a third party?
Lehman: Well it’s definitely not possible for you, a single person, to do everything. So you must hire people you trust, whether they’re running a full platform for you or whether they’re just racking up some servers and handling the cooling.
You have to feel comfortable that those people will generally give you good value for your money (since you can’t literally observe everything they do) and that they will tell you when something’s wrong as soon as they know, rather than covering it up.
I used to feel this way about Heroku, and I might again in the future, but I don’t right now.