Podcast: Play in new window | Download
Alexis Le-Quoc started Datadog in 2010, after living through the Internet boom and bust cycle of the late 90s and early 2000s. In 2010, cloud was just starting to become popular. There was a gap in the market for infrastructure monitoring tools, which Alexis helped fill with the first version of Datadog.
Since 2010, the number of different cloud infrastructure products has proliferated. There were new databases, queueing systems, virtualization and containerization tools. Web 2.0 took off, and thousands of new Internet businesses got started. Many of these businesses used Datadog to monitor their increasingly wide range of infrastructure configurations–and Datadog began to scale.
On today’s show, Alexis tells the story of how Datadog grew from its first product into a variety of tools–infrastructure monitoring, logging, and application performance monitoring. Monitoring is a unique challenge–there is a ton of data, the data is latency sensitive, and the data is operationally important. These engineering constraints provide for a great conversation. Alexis is the CTO of Datadog, and we talked about cloud providers, building a business, infrastructure, and how to scale engineering management. Full disclosure: Datadog is a sponsor of Software Engineering Daily.
Transcript provided by We Edit Podcasts. Software Engineering Daily listeners can go to weeditpodcasts.com/sed to get 20% off the first two months of audio editing and transcription services. Thanks to We Edit Podcasts for partnering with SE Daily. Please click here to view this show’s transcript.
Today’s podcast is sponsored by Datadog, a cloud-scale monitoring platform for infrastructure and applications. In Datadog’s new container orchestration report, Kubernetes holds a 41-percent share of Docker environments, a number that’s rising fast. As more companies adopt containers, and turn to Kubernetes to manage their containers, they need a comprehensive monitoring platform that’s built for dynamic, modern infrastructure. Datadog integrates seamlessly with more than 200 technologies, including Kubernetes and Docker, so you can monitor your entire container infrastructure in one place. And with Datadog’s new Live Container view, you can see every container’s health, resource consumption, and running processes in real time. See for yourself by starting a free trial and get a free Datadog T-shirt! softwareengineeringdaily.com/datadog
Kubernetes allows you to automate and, thus increase the speed of deployment. But, as you rapidly deliver, are you aware—and prepared—for issues in production? VictorOps incident management empowers progressive teams to ship to prod without worrying about a nightmare firefight. With VictorOps, your team has context to fully understand application and system health. Coordinate on-call teams, collaborate when incidents occur, and monitor Kubernetes through a large number of monitoring integrations. Datadog, Prometheus, and Grafana and hundreds of other tools integrate directly with VictorOps to help give you deeper visibility into application health. Visit victorops.com/sedaily to see how VictorOps can help you manage incidents and improve system observability. See how VictorOps helps you build the future faster!
Triplebyte is a company that connects engineers with top tech companies. We’re running an experiment and our hypothesis is that Software Engineering Daily listeners will do well above average on the quiz. Go to triplebyte.com/sedaily and take the multiple-choice quiz, and in a few episodes we’ll share some stats about how you all did. Try it yourself at triplebyte.com/sedaily.
Failure is unpredictable. You don’t know when your system will break, but you know it will happen. Gremlin prepares for these outages. We provide resilience as a service, using chaos engineering techniques pioneered at Netflix and Amazon. Prepare your team for disaster by proactively testing failure scenarios. Max out CPU, blackhole or slow down network traffic to a dependency, terminate processes and hosts. Each of these show you how your system reacts, allowing you to harden things before a production incident. Check out Gremlin and get a free demo by going to gremlin.com/sedaily