Slavish adherence to guidelines considered…inadvisable?

Posted on November 20, 2015 in Security • 7 min read

Industry “best practices” are STILL no substitute for expert help. Yes, this is one of ‘those’ stories.

Recently I was going over some logs trying to track down a mysterious ‘disappearing’ directory. Situations like this are typically not mysterious in the least, since directories don’t disappear. Either they’re deleted, scrubbed from disk, or a filesystem or disk error renders them unreadable or unavailable. Finding evidence of none of these things in the logs, I decided to go deeper, into the system audit log configuration. And what I saw was both utterly ‘compliant’ and more than a little worrying.

What is audit logging?

First, a small aside. Some may be unfamiliar with the differentiation between audit logging and traditional logging. It’s actually easier for me to slice logging into three categories: debug, operational, and audit. Operational logs typically only contain serious error conditions. Debug logs should be a detailed description of code flow during runtime to allow speedy resolution of software bugs. Audit logging could be described as somewhere between the two. Audit logging concerns itself with what I’d call the CLUE Factors:

Who, where, when, with what?

Colonel mustard, in the Library, at 1200Utc, with the Ethernet cable.

In many Linux distributions, the primary methodology for providing such logging is called auditd, a userspace application which receives and filters low-level system call information from the kernel using sets of rules on the kind of events that you are interested in.

Read between the guidelines

On this system, the logging rules had been set up in compliance with the CIS guidelines for hardening RHEL 6. Until the most recent version of these guidelines, there were multiple mis-statements that the logging rules would capture events from all users, including root. This was not only not true, but not true in a way that anybody who cracked the manual for auditd would immediately realize. Thankfully this verbiage has now been clarified, and the newest version of the guidelines clearly state what the rules actually do. But how, exactly, did these incorrect statements get into the guidelines in the first place?

With some simple googling on snippets from text it’s easy to see that it’s been copy-pasted all over, including not only the CIS guidelines but the DISA STIG and even the RHEL documentation itself (they at least explain what it actually does now). Of course, if you wanted to find the origin, you’d look for the oldest copy, and I find that to be the NSA’s very own guidelines to hardening RHEL 5. But wait, you can also find these guidelines implemented verbatim in countless configuration management repositories on github and elsewhere. So, guidelines for a completely different OS version, endlessly copy-pasted across the internet until it becomes simply “how things are done”. Not a comforting thought.

Hosed no matter what you choose

But what exactly is wrong with these guidelines? That is, of course, going to be a matter of opinion. And mine is that, while these guidelines are largely a good effort, they appear to have been designed as security in a vacuum. That is to say, without regard for the actual tactics and procedures of the modern adversary. Many steps are taken to secure the system have simple workarounds, workarounds that many attackers will probably learn in the first day of their education. The rules appear to be drafted under the presumption that system services are typically somewhat trusted, and that, I think, is a mistake.

Let’s take, for example, the rule about logging file deletion events. In the CIS guidelines (and therefore many other places), it’s implemented like this:

-a always,exit -F arch=b64 -S unlink -S unlinkat -S rename -S renameat -F auid>=500 -F auid!=4294967295 -k delete
-a always,exit -F arch=b32 -S unlink -S unlinkat -S rename -S renameat -F auid>=500 -F auid!=4294967295 -k delete

The very first thing that you’re likely to learn about on any of the introductory CTF trainings that you can find on the internet is lax permissions. Now, let’s assume an all too common breach scenario. An attacker gets in, and finds a logging script that an admin threw in for debugging purposes and forgot to remove. This script, let’s call it “testfile”, is in a root’s crontab and regularly runs some test and sends an email. You can imagine that this is a thing that happens in ops land fairly often. Furthermore, since he was in a rush (and he was going to delete it before he got distracted) he just made the script world-readable. Yes, this is how breaches happen, but the point of concern here is that they happen invisibly.

Observe, I’ve got a standard issue CentOS image set up in vagrant for testing. With the above rule enabled, I’ll create a file and then remove it with a root cron job:

[[email protected] vagrant]# touch /tmp/testfile
[[email protected] vagrant]# echo  * * * * * root /bin/rm /tmp/testfile > /etc/cron/cron.d/testfile
[[email protected] vagrant]# touch /tmp/testfile
[[email protected] vagrant]# ls -lhat /tmp/testfile
ls: cannot access /tmp/testfile: No such file or directory
[[email protected] vagrant]# ausearch -f /tmp/testfile
<no matches>

For a control experiment, let’s remove the same file manually:

[[email protected] vagrant]# touch /tmp/testfile
[[email protected] vagrant]# rm /tmp/testfile
rm: remove regular empty file `/tmp/testfile'? y
[[email protected] vagrant]# ausearch -f /tmp/testfile
----
time->Fri Oct  9 09:13:28 2015
type=PATH msg=audit(1444382008.143:550): item=1 name="/tmp/testfile" inode=263895 dev=fd:00 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
type=PATH msg=audit(1444382008.143:550): item=0 name="/tmp/" inode=261121 dev=fd:00 mode=041777 ouid=0 ogid=0 rdev=00:00 nametype=PARENT
type=CWD msg=audit(1444382008.143:550):  cwd="/home/vagrant"
type=SYSCALL msg=audit(1444382008.143:550): arch=c000003e syscall=263 success=yes exit=0 a0=ffffffffffffff9c a1=1b310c0 a2=0 a3=7ffdb492e890 items=2 ppid=1511 pid=1531 auid=500 uid=0 gid=0 euid=0 suid=0 fsuid=0 e
gid=0 sgid=0 fsgid=0 tty=pts0 ses=2 comm="rm" exe="/bin/rm" key="delete"

As you can see, when we convince a system service to remove the file for us instead of removing it ourselves, it simply vanishes. Your audit logs have now failed to detect post-breach attack actions and you can’t tell the FBI what happened.

auid, and ‘what is a user’

So let’s take a moment to understand the rule above, what it does, and more importantly what it doesn’t do and why that matters to you. The rule observes the system for syscalls of certain types and groups them under a tag or key as candidates for logging. Of course, it also includes the -f argument for filtering, and in this case it filters by something called an auid, or “audit user ID”. In this case, it filters out everything from 0 (root) up to 500. This space is typically considered reserved for system accounts, which users are not supposed to be logging into and are to be used only by applications. Since this auid is assigned to the user’s session by the Linux PAM system during user login, and matches their ID at login. However, there are a number of events that could change a user’s ID during usage, the most obvious being the command “su” which allows a user to assume the identity of another user. The important thing here is that using su does not cause a users auid to change, only their “real” uid.

The benefit of this auid is that it supplies better correlation between a real user and the actions taken by that login. And indeed, the users who are able to alter this session variable are restricted. This rule does its job well, but it does it under a false premise; that you don’t need to worry about auditing system service accounts because their actions are known and that you don’t want to drown in audit events But to that I respond with another quote from security researcher Moxie Marlinspike:

“You are running network services with security vulnerabilities.* Again, *you are running network services with security vunerabilities.”

Protecting yourself from threaty threats

This article, post, rant, whatever you want to call it, doesn’t exist to address the question of how to prevent breaches. It exists to address the problem of blindness. The CLUE factors that I brought up earlier I did for a reason, and that is that they’re likely some of the first things your bosses and later your legal team will ask you about. All of the #cyber insurance in the land won’t help you if you cannot prove that you were hacked, as opposed to accidentally dumping your own databases. And when that cannot be done, boardroom questions become inevitable

Breaches happen. It’s a fact of life in this sector now. What’s important is incident response, minimization, and post-breach reaction. And when it comes to all three of those, you need to be able to answer the question “how can we do better next time.” In order to have that information, it’s important to become informed enough to look at industry guidelines with a critical eye, and determine whether or not they truly work for you, so you don’t end up both breached and blind.

The bottom line is this: best practices are no substitute for knowing and designing around the environment that you actually live in. Securing yourself against threat models that do not apply to you is both ineffective and wasted effort. There is no substitute for experienced personnel who can understand and adapt the technologies in your environment to the needs of your environment.

Jason R.