Dealing with user submitted content on a 200+ million user platform must really try the patience of security researchers for Facebook.
It would be great if Facebook had a security research group that openly published the results of their findings, those findings that don't expose corporate secrets. They almost certainly have seen it "all."
The parse_url vulnerability is probably a good example of why you don't want to use blacklists for filtering out malicious input; you want to use whitelists, and then you want to reconstitute the thing you parsed into a form that can be parsed unambiguously.
parse_url(" javascript:alert('hello')") yields
Array
(
[path] => javascript:alert('hello')
)
which clearly does not have an URL scheme on any whitelist you might apply. Even if it had incorrectly claimed the scheme was "http", the reconstitution step would give you an URL like "http://localhost/%20javascript:alert(hello), which would avoid the problem.
2. Regarding video; Don't allow Javascript code to be submitted and presented anywhere. The code to link to videos is unescaped javascript (with space) followed by a video link.
Actually, I'd say that neither of those are really lessons here. Facebook already does #1 (and has for some time). #2 was already understood as part of their threat model (hence why FBJS exists), but a bug allowed an attacker to bypass their filtering.
If you HAD to draw a lesson here, I'd go with something more along the lines of
Even if you protect your site properly against clickjacking and CSRF, an XSS vulnerability allows an attacker to bypass all of those protections.
Yes, but Facebook was already filtering out JavaScript: their filter just happened to be slightly broken ;)
"According to Facebook, it turned out that some older code was using PHP's built-in parse_url function to determine allowable URLs. For example, while parse_url("javascript:alert(1)") yields a scheme of "javascript" and a path of "alert(1)", adding whitespace gives a different result: parse_url(" javascript:alert(1)") does not return a scheme and has a path of "javascript:alert(1)"."
"This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly. "
It seems like parse_url is not designed for filtering javascript. Since it came out in front-end, it seems that Facebook initially accepted the embedded javascript in the link.
In essence, Facebook was checking the "scheme" return value and blocking any URLs where the scheme was "javascript". By adding a space, the scheme ended up blank and the URL slipped through. Facebook has never allowed FBML apps to use javascript: URLs in links.
1. Yeah, in PHP it's as simple as generating an MD5 hash, setting that as a session value, and then including it in a form and verifying it when it's POSTed.
2. HTMLPurifier is a good PHP library for cleaning text. But personally, HTML shouldn't be allowed on any site unless you're using a WYSIWYG editor or something.
They do, and have for some time (although I can't vouch for the particular algorithm used). fb_dtsg is the usual parameter name. Unfortunately, an XSS vulnerability by its very nature allows an attacker to subvert those protections.
What would be a (simple) abstraction from browsers that would just thwart all these attacks? The main problem here seems to be the use of heuristics for identifying malicious content.
One of the main problems is mixing code and data. Say there's a new HTTP header that tells the browser to disable inline scripts, would that help solve the problem?
It would be great if Facebook had a security research group that openly published the results of their findings, those findings that don't expose corporate secrets. They almost certainly have seen it "all."