Monday, October 11, 2010

Safely Handling Rich HTML in Python Apps

Great approach to prevent XSS or HTML injection in your python application while still allowing rich HTML from untrusted sources.

I had a great chat yesterday with James Socol, author of Bleach
(http://pypi.python.org/pypi/bleach). Many of you are likely already
aware of Bleach but I wanted to get some more information out to everyone.

What is Bleach for?
Bleach can be used to safely allow an application to accept rich HTML
content from an untrusted source (user, third party, etc) and render
this content within the page. Without bleach this would be a ripe area
for XSS and HTML injection.

How does Bleach work?
Bleach accepts a minimal whitelist of html tags that are defined in the
bleach configuration. Any other tags provided within the data are HTML
entity encoded to prevent malicious rendering within the page. As a
result, only the whitelist'ed tags are rendered. As long as the whitelist
is intelligently constructed (which it is by default) the rendered content
is never able to perform malicious actions.

When should Bleach be used?
When you want to allow rich HTML content within the body of a page and
this content is coming from an untrusted source (e.g. user, third party site).

When should Bleach not be used?
If you have no intention of allowing any rendered content from the user,
then Bleach is the wrong approach. In those cases just stick with the
default output encoding provided by django or jinja.



-Michael Coates