In case you haven’t heard, Redis is one of the most widely used databases in the world. It’s also one of the most popular software projects on GitHub, right up there with tools from Facebook and Google.
Like other open-source software, Redis has many contributors. But it was started by one individual: an Italian named Salvatore Sanfilippo. Last month, Sanfilippo was in San Francisco to speak at RedisConf, the conference for the database. He took some time out of his visit to speak with VentureBeat about the history of Redis, the challenges of maintaining open-source software, his decision to leave Pivotal and join startup Redis Labs, and the big news of the day — the launch of modules for Redis.
Here’s an edited transcript of the conversation, which lasted for more than an hour.
VentureBeat: How have the new modules been received so far, since you announced them at the conference this week?
Salvatore Sanfilippo: It’s an interesting question. One thing is the feedback I got at the conference. Another thing is the feedback I read in the news comments. In general, people are happy because it means freedom for users. That’s good. They know that, at this point, if they have to specialize Redis for their use case, they can do that without their contribution needing to be accepted. They can go at a speed that’s different from the speed of the project. Because I’m very focused on keeping it small, I don’t have a good reputation for being open-minded with new features. I’m extremely conservative. Otherwise, after seven years of contributions, if I accepted most of them, it would be huge at this point. So people are happy about this point. However, there are also people who are concerned, because half of the community shares my opinion about keeping things extremely simple. It’s the point of view of the programmer who believes in [the system] not being able to cope with complexity. That’s my point of view and the point of view of many other programmers at the moment. People are realizing that complex systems — you can make whatever effort to make them work, but they have lots of unexpected side effects when you’re in production and you start to mix one complex system with another complex system. They fail in ways that you could never imagine. To keep things simple is good. The community is worried about modules starting some trend of complexity in Redis.
VB: What would you tell them, if they’re afraid of it becoming too complex?
Sanfilippo: One reply I made in a comment — even though we added modules, my point of view on software didn’t change. I will not write modules myself, nor will I include modules in the official Redis distribution. The modules are actually a way [for me] to continue doing what I was doing — to be conservative. But this way it’s more democratic, because the other people who disagree can extend Redis.
VB: Have there been many contributions so far, new modules that people have contributed since then?
Sanfilippo: No, not at this point. People are just starting to ask questions. They’re starting to understand the API. But I believe that the first serious contributions we’ll see are when the API is complete. At the moment, it’s more like a beta.
VB: You have to go home to Italy and build it, right?
VB: Is it possible, before you build it, for me to write a module, even if I don’t know how the API works?
Sanfilippo: Yeah, yeah, because a good person — once the API is implemented, they know what’s expected. But it’s still — we don’t guarantee that everything in the GitHub repository will remain like it is. Before the first stable release, we want to take the opportunity to make changes to the API. After the first stable release, at that point we won’t have the possibility of changing the API again. So it’s better to make any errors now.
VB: When you were starting Redis early on, who did you look at as models in terms of open-source technologies and their creators? You talk about being conservative in what will be part of Redis. What were the ones that influenced you to have that vision, for it to be specific?
Sanfilippo: One of the projects that influenced this vision, and also the module system I build, is the Tcl language. A couple of years ago, they won the ACM award for the best codebase. It’s truly an impressive system. The stability of the language is such that it’s used in many mission-critical applications — oil extraction platforms and stuff like that…many critical industrial applications. It’s vertically supplied. From the point of view of the minimalistic approach, I can’t remember any others that have inspired me like Tcl.
VB: What prevented you from doing the modules earlier on?
Sanfilippo: Fear. [laughs] Basically, Redis without modules was a set of verbs, of commands. If you know that set of verbs, you know Redis. You can read the source code of an application using Redis, and you understand what’s going on. Now that you can extend Redis, that’s no longer true. There will be a multitude of verbs that are not part of the core of Redis. So in some way there is no longer a common language for everybody using Redis, and that was one fear. Fragmentation, basically. The second fear was that a module system designed in order to be completely decoupled from the core means that the core could not evolve anymore. But this second point I was able to basically fix by providing an API that’s completely split from the core, so that I can evolve the core, and the module system will still work, modules will still work. But the core is independent. The way this was possible was to create a layer of abstraction between the modules and the core. It’s a lot of work. In theory, you can take this area of abstraction away and things will work. But at that point, if I do more work on the core, the modules break. It’s like in the Linux environment, where there is no stable API. If you write a device driver for Linux and the next version of Linux is released, sometimes you have to re-adapt your driver in order for it to work with the new version. I didn’t want that. In order to back modules and evolutions in this way, either the core remains frozen at a given point, or you have a middle layer that translates between modules and the core. That was the solution, this middle layer of abstraction.
VB: Were there people in the community saying, come on, do it already? What was the dialogue over the years? This was 2009, right? What was the interaction from the community like, over expanding? Was it just issue after issue saying, can you add image processing to Redis?
Sanfilippo: It was even stronger than that. One guy actually implemented a module system, forked Redis with a module system. He even wrote some interesting modules for his module system. But his module system was of the kind that, if you rewrote the core, the modules broke. There was this harsh conversation about, oh, it’s good to have modules, but the user base didn’t follow the fork. They remained with the stable Redis. But still, this was an interesting message for me to capture, saying, okay, it’s better to move forward from this point of fear. I decided to do what was suggested, but in the right way. In order to write another system that’s just accommodating the modules was maybe one week’s effort. But the layer of abstraction was months of work. Eventually, it’ll be something big.
VB: Did it not make sense to do it in previous jobs? Did you not have the time, the sense that your company supported you doing it?
Sanfilippo: There was support, even before, for this vision. But there was a different stress on other functionalities. I was independent as well, from the point of view of the features that were important. There was more stress about the internals of Redis, instead of the user-facing features. Now, with Redis Labs, the user-facing features are considered in more regard. So actually yes, there is an environment that’s stimulating me to put more work toward the API.
VB: I wanted to address this specific thing. Why did you decide to move to Redis Labs itself?
Sanfilippo: The main thing for me was, I believe that Redis at this point started like a small project, but now the source-code size and the complexity inside the project is starting to approach the level of what a single person can do. I watched where there were Redis talents, and Redis Labs basically was able to hire, over the course of several years, certain programmers who are able to understand the Redis core in its full complexity. They understand every piece of the Redis source code. In my opinion, this is the only way to continue the software. Also, I’m not very happy with the idea that if I got hit by a truck, the project would disappear. So now it can continue, regardless of any specific person. This was also…one thing to do in order to work with other people, to have the skills in order to advance the project — if, for example, I open a pizza shop or whatever, they’ll be able to continue the work. For me, the talents that they have, the coders they are, that’s the first reason. The second reason was that I no longer believe that open-source projects of great size are sustainable without an economic counterpart.
[There are] people who are focused on making the project sustainable by selling it in some commercial way and funding the open-source project. In system programming, it’s not like you receive good contributions from people who aren’t paid to write code. It’s a lot of work, and it’s a lot of debugging. You can break everything with one line of wrong code. It’s a different ecosystem, system programming, compared to a Ruby library or something like that. You want people who are paid to do this.
VB: What do you think of the chances for Redis Labs to carry this out into a major business? To take Redis and make it into a company that goes public or a company where there’s hundreds of millions in revenue, a large company. Do you think it will? Do you think Redis will be the foundation of something like SQL databases?
Sanfilippo: The current trend, what I’m seeing in the last six months to a year, is that there’s an impressive adoption of Redis in environments where new things usually aren’t welcome. Like in places where usually they’re conservative in what they use, they’re starting to use Redis. People are learning that it’s possible to put Redis in big enterprise environments. However, what’s important to understand is that Redis has an advantage compared to other databases trying to enter large corporations. You can use it as a replacement for what they’re using now, or you can use it as an aid. This is how Redis entered a lot of environments. Instead of replacing everything, they have a small use case, and they put Redis together with the data store they have. Then, as they gain trust in Redis, they start to replace more pieces with Redis. This is an interesting model for Redis, to be able to enter these kinds of environments in a more low-profile way compared to a big company saying, okay, let’s drop SQL in order to use Redis instead. Maybe there’s this little use case that’s better solved with Redis, and then they see this thing works very well for what they’re doing, so maybe it works for this and this and this too. This is a better strategy than trying to override the systems they’re using in the big corporations.
VB: What is Redis not good for?
Sanfilippo: Redis is currently not good for data problems where write safety is very important. One of the main design sacrifices Redis makes in order to provide many good things is data retention. It has best effort consistency and it has a configurable level of write safety, but it’s optimized for use cases where most of the time, you have your data, but in cases of large incidents you can lose a little bit of it. So, for example, if you want to make a payment system, you don’t want to put data that must absolutely be retained inside Redis. Unless you configure it in a specific way, but then it doesn’t make sense to use it. Basically, I don’t have a vision of a given software to take over everything. That’s not what I’m trying to do. We have transactions, but they’re not SQL-like transactions. It’s a different use case. I believe that we specialize in solving a specific use case; that’s great. There will be other systems that will be great for other things. One thing I don’t like right now about the database landscape is that all the big players are trying to claim that their systems can be used for a very wide spectrum of applications.
VB: I also wonder what made you choose to make Redis the way that it was. How did you choose the architecture for Redis in the beginning? What were you looking for?
Sanfilippo: I had this specific use case of real-time analytics. This is exactly the use case that Redis is most employed for. I needed to retain my data — not just a cache, but I wanted to write data to disk. For me, it was much more important that accessing the data was extremely fast than having extremely strong data safety. I tried to find a trade-off between storing things on disk and taking them into memory. So basically, I used my use case as an initial guide. Then I started to use Redis itself for other things. My work had become to write Redis, so I started to use the user inputs and other use cases as a way to guide me in future developments. But the initial model was still that. Because there are a lot of databases that focus on strong consistency and data durability. A lot of systems are completely ephemeral, like Memcached, for example. I wanted to have something in the middle.
VB: What did you need real-time analytics for?
Sanfilippo: I was writing a web analytics web application, where the users were able to see who was accessing their website in real time, and the navigation pattern. This user is coming from Germany. He’s accessing this page for five seconds. The navigation pattern is this and that. We made this product and started to provide it to users with a freemium model, using MySQL. However, we wanted to monetize with a freemium model, so it was important to us that non-paying customers — that it would be very cheap. With MySQL, we couldn’t scale a lot of users per computer core. So we asked ourselves — I and my other founder of the company — we should invent something to solve our problem? So I started to write a prototype for Redis. That’s how it started. It worked very well. After I applied it to my problem, I recognized that this could be useful for more use cases, not just the one I had. It was a general problem that people had. I published it on Hacker News, and there were members of the community that immediately understood there was value in it and started to use it. In a couple of months, it was already a big deal.
After a couple of months of usage, nobody was talking about it anymore. No traffic on the mailing list, nothing. Then I made the decision to continue it, regardless of that. I like this stuff, but probably it’s just — at the start there was a bit of hype and then nothing. I said, probably if I continue to put value — and I enjoyed doing it — something will happen. I continued with it a bit, and then there was another big period of hype. And [this happened] again and again until it started to be an exponential process. But it wasn’t just like, build it and they will come. We had to put a lot of continued effort into it for people to start to understand we were serious about an open-source project.
I started the project not at GitHub, but at Google Code. After six months, everybody switched to GitHub, and I switched too.
VB: You chose BSD from the very beginning. Why BSD?
Sanfilippo: Because I think that the alternative licensing scheme, the GPL — the other licenses force you to release the changes under the same license. That’s not good for users and not good for the developer. They’re not good for users because it’s very simple for a mistake to go outside the freedom the license gives you. For example, even if you just have a deployment system that uses binary packages and you modify the source code, then you’re probably outside what’s allowed. People are starting to develop more and more fear of working under the GPL. And another reason is, it’s bad for you because you have to get the copyright reassigned to you every time you do a patch. If you want to do a commercial product out of your product under GPL, you need to make sure everyone that provides a patch gives you a piece of legal paperwork where the copyright is transferred from them to you. This is why I went with BSD. But now, for the first time, GPL could be interesting again because of the cloud vendors. Because with BSD, cloud vendors are able to extract a lot of value from an open-source project, to the point of making it very hard for the project’s initial creators to make a business out of it. Let’s call it the “AWS problem.” The AWS problem, technically, could be enough in some way to create problems for the whole open-source ecosystem. Now people know that if they start an open-source project and put a lot of effort into it, they could be marginalized by AWS. That could mean that GPL would return again as the primary license for open-source software in the future.
I believe that it’s possible that the open-source community as a whole will start to think about different licensing models in the future because of this problem.
Apache provides a bit more control, but it doesn’t fix the problem of the cloud vendors’ ability to capture most of the value of an open-source project. It could be an alternative, but I believe that what could happen in the future is really a license that has special terms in order to distribute the software as a service. Apache allows you to distribute software as a service without providing anything back to the community.
VB: What do you think of the impact of AWS having a cloud service based on Redis? What has that done in terms of adoption?
Sanfilippo: I don’t have the exact numbers, so I don’t have an exact idea. I believe that many users are on AWS, using Elasticache or Redis Labs or other companies that run managed services like that. However, when Redis could be extremely interesting is when there are big social networks that need to deploy Redis with a very large amount of memory. This is currently not possible at reasonable prices with cloud vendors. So for example, the big users are still running with their own hardware. Probably the number of users running in the cloud is very big, but all the biggest use cases for Redis are outside the cloud.
VB: In terms of memory, what are the biggest deployments?
Sanfilippo: I think that Twitter so far is one of the biggest, together with other popular Chinese social networks — Weibo.
VB: What new features would you like to implement in the core going forward?
Sanfilippo: One of the main projects is to make Redis multi-threaded. It’s an extremely hard feature to get right, but fortunately it can be done in steps. What’s surprising is that most of the gain is just in threading the simplest part to thread, the I/O with the clients — not the access to the data structures themselves. That’s a very complex solution, because the Redis data structures are complex. So concurrent access to the Redis data structures is hard. But most of the CPU time currently is spent inside the kernel, sending a reading from the socket. So if we just make this layer multi-threaded, we gain most of the advantages of multi-threaded applications and using multiple cores. This is one of the big features. Also, at the conference, I was approached by a big number of users using Redis cluster. They’re very happy with the fact that you finally have HA [high availability] and sharding, but they want the system to evolve to get more mature. So I want to bring more features to Redis cluster, more maturity to Redis cluster, absolutely.
VB: Like what?
Sanfilippo: Faster shardings. More tools, backup and restores in a way that you can back up the whole cluster in a single directory, and if a disaster happens you can restore everything back. And more observability of what’s happening inside the cluster. Monitoring tools and stuff like that. Modules, we’ll continue with modules and provide more APIs, that will also will be a big thing. Also, I would like to add new data structures, especially in the field of time service. That would be an interesting expansion. Users use Redis a lot for time service. But if we have a specialized data type for it, it could allow Redis to do more than people can do now.
VB: Tell me about your maintenance of the project over the years, the kinds of things you’ve heard from the community. What is it like to be the key person in a widely used database?
Sanfilippo: Basically it’s very challenging. There are three main challenges. One is to be able to cope with the amount of requests generated. As the user base gets bigger, they send you more, but you remain one. That’s a big problem. If you handle all the requests, you don’t work. If you don’t handle the requests, people feel abandoned, and you can lose valuable contributors, because the contributors must be filled with interest. One gives you a small page and you say, oh, that’s cool, please give me more, your work is very interesting. When you say that, people feel a lot more motivated and start to give you more contributions. If you don’t replay, maybe this person is gone forever. It’s a challenge. But still, it’s too much, to do all the work that’s needed. The other problem is the cultural and social aspect.
For example, a couple of weeks ago I received a request to change all the mentions of master and slave with terms that are more politically correct, more acceptable. This was a very complex job because we have these two terms in the user-facing API. I understand the point of view of the people that say, if there are other terms, why should we use these? At the same time, certain things can’t be done easily, because the cost for the community will be too big. What I try to do, instead of being completely closed to things like this, is to find alternative solutions. In that case I said, I can have a note about slavery and the history of slavery on the manual page, which I did. The original poster still wasn’t satisfied, but I believe that instead of just saying no, I tried to find an alternative solution. So there are social challenges that are interesting.
Another thing is, you’re always the “no” man. You have to say no all day. No no no. This feature, no. It would be interesting to do this change or this recommendation, and you say no. You say no to a lot of things because you want to make sure there’s a given level of quality and focus on the project. But this is, from a personal point of view, a bit taxing. Saying no isn’t cool, but you have to say no. We want to be the one to say yes, to make people happy, but then the project — sometimes I would say, okay, let’s say yes more, for like three months, and bad things happened. Less stable code base, more problems. So okay, there’s a reason I say no. But those are the main challenges with the community. There’s another aspect, which is that the project is open-source. Technically, whoever uses the project isn’t paying you, but in some way you’re still responsible, because the people who use it, if the project fails at night, they have to get up and fix stuff. So you feel responsible for the user base. When there’s a bug, even though you’re on vacation or something, you feel compelled to come home and fix it. If you don’t have the ability to relax — that’s a problem, due to the fact that I don’t have a larger team to work on the open-source side. If you have three or four people working on the open-source side that can handle critical situations, you can relax. But in the current setup — I believe that for Redis in the future, the biggest challenge is to find other full-time open-source developers. Not just for the money, but because there’s not a lot of C programmers who want to do system programming out there.
VB: What do people not realize about the responsibility you have? What do people not get that you have to deal with because you’re the creator of the project and the person whose GitHub account is behind the Redis project?
Sanfilippo: One thing I don’t believe people really get is the amount of effort needed to make Redis a very stable software. Every time we make a change, there’s an extreme effort in order to take it back to the same level of stability it had before. It means, for example, that if I have to touch a very important piece of code, I have to do it when I’m completely lucid. It doesn’t work if I write it in a 90 percent way and then try to fix it. The bugs you write in the first implementation are extremely hard to fix later. They don’t go away easily. Basically there’s this process where you say, okay, I want to change something, but I want the software to remain stable. So you start to think about it for weeks, the way you want to do it, without writing any code. Maybe people, in this process, they see no committing in the GitHub repository and they think, what’s he doing? Having fun? Eating out and drinking wine? Instead, there’s this huge design process. But because of this design, sometimes we can write a new feature using half as many lines of code in a much more simple way, a much more stable way. You think and think and think and find that a couple of days ago, what sounded like the best design — it starts to sound pretty lame, actually, and you find another and another. At the end, you understand that probably that specific one was the best, and then you start doing the implementation.
The effort to keep the project stable, and the design effort, is probably not well understood, because the stress is all on the moment you write the code. This means, in turn, that people say, don’t worry, I can help you implement this. And you say, no. If you want to help me, you have to put more time into the design effort. Writing the code is the easy task. The hard task is understanding what to do and in what way to do it. Once that is established, you can write the code and it’s very easy. This process is not well understood.
Another thing that’s probably not understood is that after so many years of working on the same thing, you have to find tricks to keep yourself extremely happy and motivated. If you’re not happy, you’re not going to do great work. The way you keep yourself happy is by not always doing the same thing. So you work on different subsystems. For example, if today it’s really interesting to do data structures, for one day do data structures. Then I’m tired of doing that so I start to write another system. Then I move on to the cluster, and I do that application. What you do is have multiple projects inside the project so you work on multiple things.
Also, it’s important to have side projects. During Redis, I wrote one project called Dump1090. It’s a program that listens for aircraft via radio and brings in information they broadcast. This is a way to take a break, basically, coding something different. After I do that for some time, I feel the drive to go back to Redis and make changes. It’s like a vacation from code, writing different code. And also, I’ve had multiple side projects.
VB: Tell me about them.
Sanfilippo: One of them was Line Noise. Line Noise is a library for line editing, but I used it inside Redis, and then it was used by the Android official project and then by MongoDB, because there was no simple line editing library, and everybody’s using it at this point. It was just a side project in order to have some fun. Another is Disque, a messaging system. It’s a fork of Redis that I started to write a year ago. It’s a messaging system with strong delivery guarantees. It’s clustered since the start, by default. This project I’m starting to be serious about, because people have liked it a lot. It’s an in-memory project, like Redis, my memory system, but it offers very strong guarantees about the messaging semantics. It’s very similar to Amazon SQS, but open-source. It uses the Redis protocol. You can talk to it like it’s Redis, but it’s a complete fork. You can use commands to create jobs, listen for jobs.
VB: I’m curious about you. You mentioned opening a pizza shop. What do you like to do other than do Redis? What’s your life like? And what do you want to do in the future?
Sanfilippo: My life for now is a mix of coding, CrossFit, and family. Sports are a very big part of my work life, because for the person I am, I can’t stay sitting down coding for more than 40 minutes in a row. All my life, I have to move, always. Thanks to CrossFit, I can express that physical energy and then go back to the computer and write code in a more relaxed way. I go to the gym every day, and I spend a lot of time with my children. I have two — one is 3 years old and one is 15.
VB: How old are you?
Sanfilippo: 39. My passion is wine. I like wine a lot.
VB: Which kinds?
Sanfilippo: Especially red wine. Sicilian, Italian, French.
VB: Do you like Californian wine?
Sanfilippo: I like Cabernet, but I haven’t had a chance to taste a lot of California wines. People I trust in Italy tell me they’re extremely good. But I don’t know the houses, the brands. When I have to order — I should go out more with people who understand California wine and enjoy that. My life is pretty much the same every day, what I do. Just the gym, coding, children. Every day’s the same. I have a pretty basic life. I travel very little, because at the start of Redis I started to travel a lot, and I realized I wasn’t working on the repository, any commits and stuff like that. I could tell this wasn’t going to work. I can travel two or three times a year, but no more than that. That’s it. I really enjoy living in Sicily, but sometimes it’s strange to live in Sicily because the services aren’t great. I pay a lot of taxes. But I don’t receive enough services, so sometimes I think about moving to the north of Italy. In the future I don’t know what I’ll do. The problem is that I never managed to get rich enough to stop working. [laughs] That’s my fault, for sure. I’ll need to work in the future as well. But the problem is, you can write code from Sicily, but you can’t be a project manager. I’m also not a good English speaker. So I ask myself, what am I going to do in the future when I stop writing code? I don’t know. I’ll see what I can do. It’s possible that in the future I’ll be forced to move to London or the United States.
VB: What do you think about remote working? You have, for example, the team in Tel Aviv. You have here. And you have an office in San Francisco. What are the strengths and weaknesses of remote employees?
Sanfilippo: I used to have companies where people wanted to work from home, but I forced them to go to the office. This was years ago. I changed my vision a bit. The problem is that thanks to remote working, you can get talent that would otherwise be impossible to get, but I wouldn’t ignore the coffee machine effect. Programmers stop in to get coffee and talk with each other. That’s where the best ideas could happen. Currently, the internet tools we have for communication are no match for the coffee machine. I’ve worked many times as a remote worker and I’ve worked many times in an office. When there are the right people working together in the same physical space, something happens that’s magical. And the ideas start to exchange at a speed that’s otherwise impossible. If they’re open-minded people who are willing to share the best ideas they have, something magical happens that allows you to really go the extra mile.
VB: Are you going to have an office in Sicily? Where do you work out of?
Sanfilippo: I would love that, to be honest, to build a small team of programmers in Sicily in order to take care of Redis. Redis Labs is encouraging me to work toward this direction. In the next few months I think I will be at the University of Catania — there is a computer science [department] — in order to recruit talent to open an office in Catania, probably. Right now, I’m working at home.
VB: In your own room? What’s it like?
Sanfilippo: It’s basically one room that’s a bit isolated from the rest of the house, where nobody enters and stuff like that. There is the room, and there is also a bit of gym stuff, so that I can pause a bit. But I do CrossFit outside of the office.
VB: Do you keep food in your office?
VB: So you always leave to eat.
Sanfilippo: Yeah. I continuously go there, work some time, and then go to play a bit with my daughter, make food with my wife and stuff like that, and then I return. And it’s like that all day.
The audio problem: Learn how new cloud-based API solutions are solving imperfect, frustrating audio in video conferences. Access here