Input validation in Java

java.util.Scanner is a simple text parser.  It breaks input into tokens that you request.  It can help you with: primitive types (boolean, byte, double, float, int, long, short), and strings that match regular expressions.  This function is also available on Android Java.

Example:

Scanner s = new Scanner("CAFE false");
System.out.println(s.nextInt(16));
System.out.println(s.nextBoolean());

Prints

51966
false

You can also work with regular expressions using String.matches().  s.matches(“regex”) returns true if the entire string matches the expression.  s.split(“regex”) returns an array of substrings divided at “regex” (the character(s) matching “regex” are not included).

Example

String s = "The food is in the barn.";
Boolean b;
b = s.matches("foo.*bar"); // false
b = s.matches("The.*barn."); // true

You can also work with regular expressions using the java.util.regex package.  You use java.util.regex.Pattern() to set the regular expression to match.  You use the returned Matcher to test matches and perform other related operations.

You should be aware of the worst-case complexity of their expression. Some can be exponential and lead to a DoS vulnerability.

String s = "The food is in the barn.";
Pattern p = Pattern.compile("foo.*bar");
Matcher m = p.matcher(s);
b = m.matches(); // false
b = Pattern.compile("The.*barn.").matcher(s).matches(); // true
b = Pattern.matches("The.*barn.",s); // true

Input validation: Finding input locations

For Java (and all other languages), a static analysis system that performs data flow analysis (e.g., IBM’s AppScan Source) can identify all input locations. In particular, any static analysis system that performs taint tracking or taint analysis can identify all input locations. For a free, academic example, see Andromeda: Accurate and scalable security analysis of web applications by Marco Pistoia, Patrick Cousot, Radhia Cousot, and Salvatore Guarnieri.

Another way of finding input is through a source code review or white-box testing. Any source code that includes any classes in java.io obviously performs some kind of I/O. Similarly, network I/O occurs through classes in java.net.

You can use a dynamic analysis system or debugger to watch for calls to classes that do I/O. For a dynamic analysis system example, you could use Chord from Georgia Tech, but doing so would require some work on your part.