XSS Validation vs. Encoding

Posted by on September 9, 2011

I came across an excellent post by Chris Schmidt on this topic which can be found at http://yet-another-dev.blogspot.com/2011/09/xss-validation-vs-encoding.html.  I think he has done an excellent job describing the problem and his point and I recommend you read his post.  I would like to take a moment to add some additional thoughts around this topic that I have been thinking about for a while.   This is always a big debate and it is important to share our thoughts.  My overall goal is to help prevent this type of vulnerability and believe that there are multiple ways to help do that.

First, let me say that I believe that Input Validation and Output Encoding are both very important for the security of a system.  For resolving cross site scripting (XSS) issues my response is always output encoding right before it is sent to the client.  The number one thing you have to know when dealing with XSS is what the context of the data is.  Knowing the context determines the type of encoding you will need to apply to it and you don’t know that until you are using the data in that context.  When I pull data from a database, at that point, I don’t know what context that data is being used in.  It will get passed back to the UI layer and somehow get used.  It wouldn’t make sense to encode at that point.  I need to encode in the UI when I set that data to the location it will finally end up.  At that point, I know exactly what context it will be used in, whether it is Attribute, Element, URL, or Javascript.  This is the best chance to apply the proper solution.

So what about input validation?  I think it is important, but it is not my recommended solution for XSS and here are a few reasons why.

  1. Relies on strict requirements – I have worked as a developer in many shops and it is rare to get really good requirements that actually define the input rules for all data items.  Without these strict requirements, how does the developer know what characters to allow and what characters to deny?  Think of an application that may allow html markup or require some of the special characters that make XSS possible.
  2. Data context – XSS has many different contexts that it can exist in.  It can be in an HTML Element, an attribute, or even Javascript.  At validation time, it is difficult to try and validate against all of these different contexts.  What if we have a name field that allows the single quote character?  For some reason, a developer decides they want to use that in an attribute and wrap it in single quotes.  The problem is not that we allow the single quote, the problem is that we are not properly handling the context when it is output to the client.
  3. Character Sets – There are a lot of character sets and it is difficult to check all of these while doing input validation.  Take for example Microsoft’s ValidateRequest feature.  Their simple blacklist attempts to filter out the less than symbol (<) followed by a character.  Unfortunately, there is an issue with character encodings that if you submit a different encoding of that character, you can bypass that input validation and if you don’t have output encoding in place you could be susceptible to XSS.
  4. Trust in your data source – How much trust can you put in your data source, whether it is some file, a web service, or a database?  With mobile applications becoming very popular and systems mixing content from different sources, and sharing data sources, can you be sure that a different platform is doing the same input validation you are?  I have seen this at a previous job where the mobile developers didn’t validate the same as the web platform because it the input wasn’t a problem on the mobile device.  It gets to the web platform, they pull it from the database and return it to the client and now it is XSS.

There are many ways to solve this issue, but when you really boil down the problem it is not an input problem, it is an output problem.  It is important to use both input validation and output encoding, but the weight here is on output encoding.

Comments

Comments are closed.