Dorsey was informally talking about some of his experiences leading up to his founding of Twitter, the challenges they faced early on (especially after their feature at, and subsequent popularity after, SXSW in 2007) and some of the approaches they took to resolve them. I’d recommend watching it as well. Let me summarize some of the points Dorsey made which I found particularly interesting:
We didn’t really turn the company and technology around for a year. And, Some of the reasons for that were that we had no instrumentation at all. We had no idea of what was going on in the system.
Apparently Twitter was going down daily after the SXSW conference, and the engineers at Twitter were not able to isolate all the bottlenecks and points of failure in their systems because of inadequate system monitoring, logging, analysis, etc. Dorsey says that at his new company (Square), he learned from the above mistake, and one of their initial priorities was building an exceptional dashboard which became a critical component of development at the company.
That’s a pretty tough situation to be in, especially for a new company with limited resources. When you’re a small company on a limited budget, you need to make all that data accessible in order to make the right decisions. You need to be able to pinpoint every fault, every corner case, and know everything about them. That’s the difference between having to buy a new set of webservers and realizing your caching policy is just designed poorly.
We’ve had some bad engineering discipline, where we would try to isolate ourselves a bit too much in our work. So, we had some tendencies to really work on singular pieces of code [sic] and it came to the point where there was only one person in the company who knew how the entire system worked. That create a huge single point of failure. We were so afraid that any little change would bring down the entire site.
Twitter approached this problem simultaneously by creating two teams:
- The architecture team: redesign the entire Twitter system from the ground-up, with the knowledge they’d gained
- The iterations team: build instrumentation for every aspect of the system, and incrementally fix and improve it
Now, I would have thought that re-designing a broken system from the ground-up would have resulted in the best approach to the problem. What’s interesting is that the architecture team failed to produce a viable alternative, and in the meantime, the iterations team actually fixed all the problems using meticulous system analytics and group review of code. In addition, part of the solution involved the iterations team adopting pair programming. In addition, the pairs of people constantly rotated around the company, so that everyone had at least some sense of how the entire system worked. As a result, they had a lot more people talking, more perspectives on problems, and in general, better solutions. In addition, it avoided the problem of isolating people. Dorsey says that it created a very creative environment because of the constant exchange of ideas.
I, like most people who write software, make that “…ew” face when I hear someone mention PP. However, I realized that there is no better way to get more brains solving more common problems than the solution Twitter came up with. Big problems would receive much greater exposure to different perspectives, different experiences, and general discussion than if the conversation never left a small team (or individual) who was responsible for fixing them. I had an epiphany, kind of like that time Gerry invited Ian and I to that design patterns meeting. Lots of good ideas that solve real problems.
Anyways, I hope you watch the video and find it as interesting as I did. It’s a good story.