…or how I stubbed my toe on the keys to a few dozen kingdoms
NOTE: this content originally appeared on the Taos blog
I’m authoring this security writeup in ReStructuredText. An odd way to start this off, I know. But bear with me, I promise that ReStructuredText is relevant to a recently patched major vulnerability I uncovered in GitLab. Hopefully that’s a strange enough sentence to capture your attention. If not I promise cookies 35% of the way through this document.
ReStructuredText is my favorite lightweight plaintext markup language, and I tend to prefer it over the much more popular Markdown on the basis of its greater standardization and the greater power of its base feature set. In addition it’s also the primary documentation parsing system for the Python community, making it an easy choice to learn for Python coders such as myself.
Because of this I have often used the Pelican static site generator to build websites. It is a tool which takes ReStructuredText as input and outputs static HTML websites. These websites are fast, secure, and simple to administer. I turned to Pelican again when creating a site for my wife’s new webcomic, herpaderp.party. It was during the process of creating this site that I stumbled across something that I can only reasonably describe as “kind of a big deal.”
In the process of creating the feeds page for herpaderp.party, I found it
necessary to have recourse to the ReStructuredText
raw directive in
order to force raw HTML into a page. The pre-parsed code looks like this:
1 2 3 4 5 6 7 8 9
.. raw:: html <form method='post' action='https://blogtrottr.com'> Your email: <input type='text' name='btr_email' /> <input type='hidden' name='btr_url' value='https://herpaderp.party/feeds/strips.atom.xml' /> <input type='hidden' name='schedule_type' value='6' /> <input type='submit' value='Follow this feed via email' /> </form> <br />
This produces the output that you see on the above page, and is a relatively clean solution to the problem that I was presented with. What shocked me, however, was the fact that after I had committed the code and pushed it to GitLab the raw HTML code I had inserted was visible, not as code, but as the button that the code itself defines.
All of the major web-based SCM platforms (GitHub, Bitbucket, GitLab and Phabricator) support previewing the rendered text of a number of popular markup formats (such as ReStructuredText, Markdown, AsciiDoc, etc). At this point it’s very near to being a required feature in the space. It’s this feature that enables the richly formatted README files that people have come to expect from the web frontends of their SCM solutions. However with the ability to insert raw HTML into the output comes the potential capability to rewrite the website itself as it’s presented to the user. This was when my concern started to bubble to the surface.
While Pelican rendering my
raw directive is totally acceptable (it’s my
site with my content parsed by me), for GitLab to render this HTML is another
matter entirely. As a general rule a website should never allow user-supplied
HTML to run on the site, especially without sanitization. The reasons for this
were well exposed by the Samy worm, but this writeup may serve as a refresher
on the dangers.
One of the first steps to determining the seriousness of the issue was to
most straightforward method of performing this is by attempting to pop up a
message box using the
alert() call, like so:
1 2 3
AJAX the great
AJAX is a web subsystem that powers the vast majority of modern websites. Fundamentally, AJAX is what allows pages to fill and populate content without requiring the user to refresh the page. Your Facebook and Twitter feeds, Google Maps, practically every popular website makes use of AJAX extensively.
Anatomy of a POC
The POC exploit first makes an HTTP GET request for the user’s dashboard page and indexes the entire list of projects to which they have access. It then accesses the user access management page for each of those projects (more GET requests) and retrieves a security token from the user managment forms on those pages. This token is required for the final step, and is designed to help mitigate against CSRF. Unfortunately in this case our attack is not cross-site at all, and therefore we can simply ask for this token (just as the user’s normal application usage would).
The final, and most critical step of a POC is the actual exploit, the “what” that occurs after the “how”. In this case the final step is to make a HTTP POST request against the API endpoint for user management to add a new user to the project with full permissions:
The simple attack…extended
The ability to execute drive-by requests on behalf of a user is dangerous on a fundamental level, but on a platform like GitLab the problem expands because it becomes possible to make the exploit self-replicating. In short, it can be extended into a worm.
If the user who is given master access to every project then adds the exploit code to the README of those projects, as more users view them the exploit code can be added to more projects. Dependent on the viewing rate it would rapidly become resident in the majority of the projects on the platform. This entire process could be automated, of course. Additionally, it’s important to remember that access to private projects would be possible in this way, so all repositories that users might have considered “secure” would suddenly be exposed. Secret keys, credentials, and proprietary code would be available to the attacker.
It’s therefore clear that even this small parsing vulnerability can, in the right circumstances, become a very serious security issue. But how did it happen in the first place? How did GitLab come to be vulnerable to this attack?
The bug is always upstream
It turns out that GitLab’s parsing of ReStructuredText documents is performed by a Ruby gem called gitlab-markup. This gem was forked from an open-source gem created by GitHub, github-markup. GitLab uses this code to run the actual Python program (docutils) which performs the rendering.
Github-markup performs this rendering via a wrapper script called rest2html.
Rest2html explicitly allows
raw directives, under code added in commit
68557d2 which flipped this switch from
commit message rather ironically reads “enable raw html”). I’ve already opened
issue 981 with that project, but have not yet received a response as to why
this was not considered a risk. From my rudimentary testing GitHub appears to
raw:: html entirely, so it’s possible this may no longer be the
code that they run on their systems. Or perhaps they perform some additional
sanitization later on.
What about everyone else?
Although GitHub may be protecting itself in a satisfactory manner, still of concern are the vast number of downstream projects that use this code. A full 708 repositories on GitHub alone are listed as dependent, but there may be many others not hosted on GitHub or who use this upstream module in a way not caught by GitHub’s dependency graph.
For all of these applications it is vital that their authors consider carefully whether they are using this code in a safe manner. Is it rendering user supplied content back to other users, and if so is that output being sanitized before use? My recommendation would be to heed the warnings of the authors of docutils and disable raw entirely from running in any context on user-supplied data. It is, by admission of its own documentation, a security risk. Heck, if you’re foolish enough to enable raw file or URL access you can actually access files and URLs from the system on which the parser is running. Terrifying.
GitLab is to be lauded for their incredibly professional and speedy response.
They were back to me in less than 24 hours after initial disclosure and had a
provisional patch to me less than 24 hours after that. The vulnerability was
publically announced on 2017/01/10 with a patch that disables the
raw:: html directive entirely.
It is a sad but inescapable truism that code will have bugs, and what’s important is that we find, fix, and learn to keep ourselves safer over time. speaking of that…
Lessons to learn
In inheriting code from an upstream source it is critical to understand that you also inherit that code’s perception of its own security model within its environment. github-markup may be secure within the GitHub and GitHub Enterprise software stack, but is not guaranteed to be secure outside of it. The usage of a library by a well known and generally “considered secure” vendor does not mean that you can incorporate it into your product without careful evalutation.
As we develop software we must not only inspect and test the code (both that which we write and that which we inherit), but also the application after all of the code has been assembled and installed. While it’s obviously infeasible to validate every line of code in an operating system just to write an app that runs atop it, there is no substitute for aggressive testing of your application once it has been stood up. Qualified professionals (did you notice that we’re on the Taos blog?) can help you discover the chinks in your armor before somebody nefarious starts holding your users for ransom and damaging your reputation.
- Taos: for giving me a good place to talk about this
- GitLab: for being so much more professional than most upon being informed of a flaw
- DC562 (My Long Beach Defcon Group): for helping me learn how to hack properly
- My wife: for putting up with my perpetual lack of sleep while I fix broken pieces of computer