SysAdvent: So Server, Tell Me About Yourself — An Intro to Facter, Osquery & Sysdig
This article originally appeared in a slightly different form on the SysAdvent blog, written by Gareth Rushgrove and edited by Hugh Brown.
Linux and Unix have always had powerful, low-level tools capable of telling you exactly what your computer system is doing (strace, DTrace, systemtap, top, ps). But these tools often have complex user interfaces and platform differences, which means not everyone has the time to master them. This article is all about several new tools that aim to not just be powerful debugging tools, but to provide a pleasant user interface too.
Facter is a simple inventory application providing a single, cross-platform interface to a range of structured data about your system. Everything is available, from network interfaces to available hardware and operating system version.
Osquery is a new open source tool from Facebook that exposes low level details of your system via a familiar SQL interface. Want to query for processes listening on a given network interface? Or for services that launch at startup? This is the tool for you.
Sysdig is another open source tool for system level exploration and tracing that aims aiming at being both powerful and easy to use. Sysdig focuses on tools to help answer real-time issues.
I’m running all of the following on an Ubuntu 14.04 virtual machine, but you should be able to find the installation commands for your favorite distribution too. As for supporting other operating systems: Facter also runs on Windows and OS X; osquery also runs on OS X; and Sysdig is Linux only.
Facter has been around for a while (it’s a core part of Puppet), and is included in lots of distribution repositories already. However, for this walkthrough, we’re going to use the preview version of Facter.
First let’s install the official Puppet Labs repositories:
Next let’s install the nightly build repository for Facter. Note that the repository and package are called cfacter to allow it to be installed alongside the stable version of Facter.
For the curious, or those wanting to use a different operating system, feel free to read up on the nightly repositories.
Osquery is quite new, and packages aren’t available just yet, so we’ll need to compile from source. First let’s download the latest release:
And then we’ll install its dependencies and compile the osquery tools. This will take a little while but I promise it will be worth it.
For the full installation instructions see the osquery wiki.
Note that if you want to use osquery for anything more than a quick demo, you could create your own package using the makefile.
The resulting system package (Ubuntu or Centos at the moment) can then be used to install the binaries without needing to compile everywhere.
Sysdig handily provides a one-line installer which detects your operating system and installs the relevant packages:
If you would rather do that manually then full installation instructions are available.
Facter is the most straightforward of the three tools we’re taking a look at. When run, it simply outputs structured information about the host, collected from various other tools or the operating system itself. This can be hugely useful if you’re on a machine and want to know everything quickly, but it’s also useful if you’re using an unfamiliar operating system, as it provides a single way of accessing lots of information quickly.
The quickest way of understanding this is just to run it:
Feel free to leave out the pipe to head if you’re running locally. The output is over 100 lines long, and looks something like this:
You can see that I’m running this on a VirtualBox virtual machine with a 40GB hard drive.
Facter supports other output formats too, including JSON and YAML. For instance you can run:
And you’ll receive YAML:
Facter also supports returning just a single value, so if you know the name of the fact you want to check you can simply ask for that. For instance:
As well as the large number of facts provided out-of-the-box on a range of operating systems, Facter also allows for writing your own facts. A very simple example might be exposing the version of Python to Facter. First write a script that outputs a simple key=value pair. Save the following as
Now we can ask for the python_version fact (the name of our key in the script above) like so:
Facter has a number of different ways of extending it, and any custom facts from previous versions of Facter should work with the new implementation.
Osquery services a similar purpose to Facter, providing a universal interface for information on a machine. Osquery presents information about the system as tables, which can be queried via SQL. The information being queried tends to return a dynamic list of results — for instance, the users present on a machine or the host entries in the local hosts file. Again, here’s a quick example:
This will output something like the following:
Let’s change the host entries on our machine and rerun the query:
Now you’ll see something like:
The above examples use the osqueryi tool, which can take a query on stdin and return the results. You can also run osqueryi on its own and open an osquery SQL shell.
With the shell open, lets build a more complex query by joining together two tables.
This should produce something like the following:
Osquery supports a large and growing number of tables, everything from arp_cache and bash_history, to crontab records and kernel_modules. It’s also possible to write your own tables if you’re happy getting your hands into the code.
Osquery also supports a long-running daemon process called osqueryd; this allows for scheduling queries for execution across your infrastructure, aggregating the results over time and generating logs of any changes in state.
Whereas Facter and osquery are predominantly about querying infrequently changing information, Sysdig is much more suited to working with real-time data streams – for example, network or file I/O, or tracking errors in running processes.
Here’s a few examples. First let’s watch for any operations that open the
Now in another tab or ssh session, open the
/etc/hosts file with vim or other editor of choice:
This should output something like the following:
Here we can see that vim made an open syscall to the
Let’s do something a bit more practical: We’ll look for any I/O calls that have a latency greater than 1ms. This would be useful if you were tracking down certain kinds of performance issues:
In another tab or session let’s run a command that should trigger a bit of I/O. We’ll use these packages in the next example too.
On the virtual machine I ran this on this resulted in the following:
The output here is showing any files where the I/O latency was greater than 1ms. Each line shows the binary (apt-get, dpkg, etc.), the action (read in this case) and the latency (1ms or 15ms). If you were using sysdig to debug a real performance problem, this kind of information should be much more useful.
I mentioned above that we’d make use of the nginx and apache-utils packages for our next example. Let’s watch all the events related to requests served by nginx in real time.
And again in another tab or session, let’s run apache bench to generate some traffic against our local nginx web server.
This should output something like the following:
Note that we’re seeing the request, the response and the log lines being written — all from the same command and all in real time. Imagine how useful that would be when debugging a production web server.
The folks behind Sysdig provide lots of examples which gives you an idea of all the possibilities: from watching the behavior of particular users, to tracking busy processes, to recording users of a specific application.
Hopefully these quick examples have given you an insight into three useful tools and into why you might want them around when you have a problem. All three of these tools present lots of opportunities for integration with your monitoring or configuration management framework.