David Moreno-García is a computer engineer who's currently working at the Compute & Monitoring Group at CERN, responsible for service delivery and evolution of compute, monitoring and infrastructure tools and services for the CERN Tier-0 Data Centre and the Worldwide LHC Computing Grid (WLCG). He is actively participating in the team that implements and maintains a Puppet-based configuration management solution for CERN resources in a big-scale and multi-location computing network. Key components of this infrastructure are, among others, Puppet, OpenStack and Foreman.
About the speaker
The Hows and Whats of Service Monitoring at CERN
The CERN IT infrastructure is spread across different countries providing different resources for physics computing, analysing the petabytes of data from the Large Hadron Collider and other experiments. We have approximately 40k Puppet managed nodes and more than one hundred virtual machines providing the necessary infrastructure to support this. Monitoring has become a key aspect of our daily operations, allowing us not only to identify problems in real time but also to narrow down the causes of them. In the long term it is also a key asset in the planning for the future and the improvement of the efficiency of the team. This session is focused on showcasing how we monitor our Puppet infrastructure (using tools like ElasticSearch, collectd, Flume, Kibana and Grafana among others), and how this has helped us in real situations.