How YouPorn Uses Redis: SFW Edition

I interviewed Eric Pickup, IT Lead at the Manwin group (the company behind sites like YouPorn and Pornhub), to tell us about their transition to using Redis, why they made the switch, and how well it’s worked. Check out Eric’s presentation on Building a Website to Scale, or get started with your own free Redis instance.  

Justin:   Can you talk about when and why you guys made the transition to Redis?

Eric:   Basically, about two years ago we acquired the site [YouPorn]. It was written in Perl at the time, which was one of the reasons I was brought on board. Although I had a history of working with Perl, we quickly decided it was just not feasible to maintain. There just aren’t enough developers around, especially strong senior developers. So, if we were to keep it in Perl it was going to be a pretty stagnant site, which is something we obviously didn’t want to do.

Right away, the decision was made to rewrite it and we started looking at different technologies. Our first instinct was actually PHP, but we didn’t want to limit ourselves so we also looked at Java based solutions. After a bunch of research and looking at what technologies we’ve been experimenting with internally, we decided to stick with PHP.

Previously we’d been experimenting with Redis, Varnish and a few other technologies. Some sites internally had already started to use Redis, mostly as a caching solution, but we wanted to see if we could use it as a real data store.

cat-on-computer

We did some early tests and based our decision mainly on performance, since that was (and is) a huge issue for us. We were very, very impressed with Redis’ general performance, and after some discussion we decided we were going to use Redis as the primary database for the website.

Previously the site had been written in a traditional LAMP stack. It had Linux, Perl, MySQL and Memcached. There were obviously some concerns about the transition. One of the tradeoffs, which I’m actually really glad we did in hindsight, was we kept MySQL in the picture. We don’t read from MySQL on the website, but we are able to use it to do things like populate new lists or hashes, as well as things we couldn’t anticipate ahead of time. We have MySQL more for ad hoc queries, and use Redis for the website.

Soon after we started developing with it, we pretty quickly we felt we’d made the right decision. For the first month or so, we were prepared to reexamine our decision but became comfortable pretty quickly. It was really a good fit for our use case.

Justin:   Why is that, and what were you looking at in terms of evaluating whether or not it was a good decision?

Eric:   Obviously ease of development is a huge one, especially when you are rewriting an entire project like this. Luckily, Redis’ data structures mapped well to what we were doing.

YouPorn at the end of the day is mostly about lists of videos and lists of objects, whether it be comments, favorites, etc. the top rated videos, or the most viewed videos. It’s all lists and then objects, which obviously map well to hashes. We do use some of the other data types but I’d have to say that about 90% of our usage falls into the case of either sorted sets or hashes.

Justin:   After deciding to use Redis, how long did it take to actually implement and have it working?

Eric:   Honestly, back at that point we were still ramping up the team. Like I said, it was a brand new project so it was mostly me and one other person when we did the initial staging.

I’d say within four weeks or so we had a good part of the site prototyped. We had the front page, all the main pages, and most of the video pages done. You could view comments – although at that point you couldn’t add comments – but a lot was done in just four weeks with just two people. This timeframe included learning a new framework (Symfony at the time) so we got up and running pretty quickly.

Justin:   How many instances are you using?

Eric:   I can’t get into specific numbers, but it’s fewer than 10.

Justin:   That’s really impressive. How did you guys manage to have so few?

Eric:   It’s grown over time as we have added functionality, but generally speaking we do a lot of caching with Redis. When we first launched the site, we did no caching. We just relied on Redis.

Over time we found the servers are running a little too hot for our tastes, so we started adding certain levels of caching. We’d have a second Redis node running on the website itself with very short cache times just to handle very popular page views.

You also have to understand that we use Varnish, which sits in front of the web servers so the pages themselves are cached quite a bit so we’re not serving every page via Redis.

Justin:   When you went about making architecture decisions, can you talk about how you decided where to use Redis, and if you made any changes along the way?

Eric:   I’d say Redis was one of the first technologies that we knew we were going to use. That and Varnish, they were early decisions. Our tests on them were pretty good and, like I said, they have already been used by the company before so they weren’t unknown to us.

In terms of what we changed, the biggest change was adding a secondary Redis caching layer. It’s really lowered the queries per second on the servers and allowed us to have more of a safety net there.

Justin:   What would you say the biggest benefit has been after implementing this?

Eric:   For one, I would say the ability to rapidly create new features has been quite powerful with Redis. I mean it’s not just Redis, it’s the full software stack, but we’ve written a nice library that sits on top of the basic Redis libraries which allows us to quickly put together new features. That’s definitely been the biggest benefit we’ve seen.

Justin:   What were some roadblocks or difficulties in making this transition? Was there any custom stuff that you had to figure out and do on your own?

Eric:   Let me think here. Implementing the caching layer took some time. Like I said, the servers were running very hot and we didn’t really want to start throwing more and more servers at the problem, so building a solution took some time.

The other thing that took some time was figuring things out. These days, most websites built using Linux systems are using MySQL as the data store. MySQL does have a huge advantage in that there is lots of documentation. If you run into a problem, chances are somebody has already dealt with it before and you’ll find dozens of sites with information and advice. Redis just doesn’t have that type of community yet. If you want to read testimonials by other people that have set it up and what they’ve learned, what settings they’ve used, what their experiences are, there is a lot less information out there. There are a lot less tips and tricks so there’s more of a learning curve.

There’s just not as much documentation out there compared to MySQL, so finding solutions to issues or simpler things, like setting up replications to disk, took a bit more time. However, as Redis is becoming more popular, the documentation and community is starting to form.

Justin:   Do you have any tips or tricks that you’d like to share with our audience?

Eric:   I’d say most of the most valuable ones, I just don’t know enough about. I’m not a systems man and a lot of it was basically system type stuff. I’d say one trick that’s easy to miss is when you’re setting up replication to disk and you have a cluster of master and slaves, make sure that there is enough time between each one so you don’t end up in a situation where they all decide to write to disk at the same time.

It’s very easy to overlook. Our initials servers were all good but later on when we added more servers occasionally we kept the default settings, which was something we had to fix. It’s one of those things that people could benefit a lot from. I’m a software developer, and I’d say most of the real lessons learned were more at the systems level. I don’t have enough information to really go into those.

Justin:   Great. Thanks for an awesome interview, and hope things continue to go well at Manwin!

Eric:   Thanks for having me.

 

Thanks Eric for a great interview! If you want to get started with Redis, sign up for a free account at RedisTogo.com.

Justin Mares.

Posted by Justin Mares on July 31st, 2013 in redisphere
5 Comments

Comments

  1. hacfi says:

    August 1st, 2013 at 1:28 am (#)

    Thanks for sharing the interview..very good read!

    Just a quick note: He’s talking about the php framework Symfony(2)…not Symphony the content management system. Just in case someone wants to look it up.

  2. Ross says:

    August 1st, 2013 at 3:52 am (#)

    The PHP framework mentioned in the article is in fact Symfony, not Symphony.

  3. Emaaaa says:

    August 1st, 2013 at 5:40 pm (#)

    I’ve already learn youporn use symfony on it software layer, i dont know it use redis as primary data store
    I’d like to know how do you use redis on symfony beacuse symfony is strongly build on orm like doctrine and propel, and the best comodity of framework is DRY and how symfony automagically wrote code for you

  4. Eric Pickup says:

    August 1st, 2013 at 11:01 pm (#)

    Emaaaa, using an ORM (at least a non-custom one) was never an option for us even if we hadn’t used Redis as the primary datastore. The performance would make it impractical.

    We have a custom “ORM” (the R in this case stands for Redis) that we have written that translates requests to Redis queries.

    You pass it a criteria object and based on the values passed, it:
    1) checks local cache (Redis on a Unix socket) to see if the result is already available
    2) connects the main cluster of slaves
    3) checks to see if the cache of a previous ZINTERSTORE exists
    4) if not, performs any ZUNIONS and ZINTERSTORES needed (these are done on the slaves)
    5) enters the key into a sorted set used to expire temporary lists (Redis does not do expires on the slave, we use a cron job to read from the sorted set and delete manually)
    6) performs zranges on the the results to get the object ids
    7) attempts to get the objects from local cache, then gets any cache misses from the main Redis cluster
    8) stores the results in the local cache
    9) finally returns the list of objects and the total matches (for pagination) to the calling method.

    Because of this ORM layer, we hardly ever actually write direct Redis calls anymore and writing new functionality is very fast. As long as the feature uses sorted sets of object ids, most new services comprise a class that overrides a few abstract methods and nothing more.

    I’ve often toyed with the idea of releasing it open source but it would need a lot of cleanup to make it generic enough to be used by others.

  5. Les liens de la semaine – Édition #40 | French Coding says:

    August 5th, 2013 at 12:18 pm (#)

    [...] Comment YouPorn utilise Redis. (vous pouvez lire ceci au bureau sans crainte). [...]