- Data Teaming
- Posts
- Big Data Ductwork
Big Data Ductwork
Cloudera's Founders Focus on Infrastructure
“Whose name is on the jersey, at the end of the day - I don't really care. It was more about impacting the universe of ideas and public goods.”
Jeff Hammerbacher, Cloudera Co-Founder
Dot-Computing
Exposed ductwork and I-beams line the ceiling. Unfinished wood panels down the hallways accent the polished concrete floor. Next door: a Fry’s electronics store beckons DIYers. This was one of the early homes of Cloudera, the data infrastructure platform that brought scalable data architecture to the early internet companies.
In the era between the dawn of the computer workstation and the launch of cloud computing, dot-coms faced a challenge: build a scalable back-end to support their rapidly growing websites. This was the era of Big Data. Companies like Facebook, Yahoo!, and Google each raced to store and analyze user traffic data. It was a gap that needed to be filled–and fast.
Enter Cloudera, founded in 2008 by Jeff Hammerbacher, Amr Awadallah, Christophe Bisciglia, and Mike Olson. These visionary founders aimed to empower data teams with better tools, recognizing that only scalable data infrastructure could unlock the full potential of the internet.
The Kernel
The founders brought diverse expertise to the table. Hammerbacher, who built the first data team at Facebook and earned credit for coining the term "data science,” joined forces with Awadallah, who led Product Intelligence at Yahoo! Bisciglia, pioneered scalable infrastructure at Google, and Olson brought decades of experience in relational databases. As Hammerbacher recalled, "We were looking at potentially starting companies, and we recognized that if we did it together, then we'd have a chance to be a much larger company than if we did it separately.” He described the need for Big Data infrastructure with the phrase “history dominates state.” For example, if the base state of an application requires 100 terabytes to operate, then the data required to analyze the performance of that system would be two or three times that size.
The team also shared a passion for building public tools that would change the way people work with data. “We came together with the idea that [Apache Hadoop] was going to be a great kernel for a next-generation analytical data management platform.” The open source software married an inexpensive and reliable incremental storage system with a parallel processing framework called MapReduce. As both systems had origins at Google, Christophe Bisciglia laid the technical groundwork for Cloudera’s platform. "I really look forward to a world where you have people with a diverse set of capabilities all engaging around the same set of data," he said. Hammerbacher shared this vision, emphasizing the importance of public goods, standards, and software. "For me, I look at the public goods that were created. I look at the standards, the software, those kinds of things."
Engineering Culture
Early on, the group prioritized a strong engineering culture. Awadallah noted, "We had a culture that we wanted to build out. And we were very careful as we picked people to join our team–that they are of the same mold, roughly; that they represent our culture of being a team player." Olson added, "The most important thing is the quality of people in your organization." If an employee didn't fit the company culture, the founders would help them find a new job. "At the end of the day, it's not about them. It's about the right team and the right formula going forward," Awadallah said. They sought open source developers and engineers passionate about infrastructure–not data scientists or visualization experts like in the companies you typically hear of in business analytics. They hired Hadoop creator Doug Cutting for example, and acquired a company by Wes McKinney, the developer of the Python open source library Pandas. These were rock stars in the sector that we now call Data Engineering.

Before Fry’s, Cloudera first called this conference room named ‘Capone’ home.
Cloudera grew thanks to this unique combination of the founders’ backgrounds, the open source community, and the team’s passion for building data infrastructure. Their timing was impeccable, landing the last large VC investment before the 2008 financial crisis. The open source community also afforded it cost efficiency and value amid the economic downturn. As one former advisor put it, “there was something else besides the technology . . . it was the team . . . I just liked all you guys.” By mid-2015, the company had reached a $4 billion valuation and investments from Intel, among others.
Trade-Offs
As Cloudera navigated the evolving landscape of cloud infrastructure and open-source technologies, the company faced challenges. New services, like AWS for example began offering data infrastructure at a smaller cost. That led Cloudera to merge with its rival HortonWorks. Internally, Cloudera’s tenacious focus on hiring A-players had trade-offs. As former Clouderan Jesse Anderson put it, the company “really lost out by not incentivizing people to stay. Cloudera needs to get the original band back together. Now, that’s going to have to happen with acquisitions and acquihires. In other words, it’s more expensive than it would have been to keep them originally.” The company’s engineering culture was not always a silver bullet. By the successful IPO in 2017, the founders had moved on.
Nevertheless, the group’s legacy endures to this day. The company’s current leaders learned the lesson of structuring data teams to drive business value. Cloudera's Chief Data Analytics Officer advises, "if I need my data products and analytics turned over very quickly . . . then I will tend to want to build a decentralized data community model . . . if your organization needs efficiency to scale for instance, usually that’s a good time and a good justification for centralization. I think centralization for platform engineering, data engineering, and data architecture typically makes sense. With data and analytics products and reporting, dashboarding, it just depends on what the business needs are." Good advice for any data team.
Fame and fortune did not matter to the founding team anyway. Hammerbacher recalled, “I didn't start Cloudera to make money . . . I look more at, ‘how do you change the tools that people use in their [lives] . . . How do you change their thinking?’” Given the importance of their accomplishments, Olson, Hammerbacher, Bisciglia, and Awadallah deserve more credit for what they built: the first company to enable scalable data infrastructure in an era of hyper-growth for internet companies. Cloudera made Big Data possible. Like how you only care about ductwork when it’s broken, Cloudera built a legendary data infrastructure team that many did not recognize until it was gone.
The ACC
The Antarctic Circumpolar Current sweeps through oceans with a silent force. The mighty tow weaves the Atlantic, Pacific, and Indian Oceans together carrying more than 100 times of water carried by all the rivers on land. The ACC regulates the climate, transporting cool waters and nutrients while adapting to changing winds and temperatures. It enables the collaboration of species and marine particles across the globe. And you may never even know it exists.
Reply