Ideally, you or others in the development group should know all input locations. However, because of hidden inputs, you might not know of all of them. Therefore, this section discusses ways that you can find them (and then test the input validation) Depending on the type of program and the OS environment, different approaches and tools might be useful. When you find an input location, document it and record what tests you performed to help avoid duplication of effort.
GNU/Linux systems
All I/O must go through a system call. The GNU/Linux command strace will show you all system calls a program makes as it runs. This means you can identify all files opened, network connections attempted, etc. You could use other ways of finding this information such as linking with a library that records all I/O, using a debugger and setting a watchpoint at all calls that open a file or network socket. Strace is the easiest.
As an example, I ran strace on the web browser Opera. Here is part of the output:
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 49 connect(49, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("91.203.99.55")}, 16) = -1 EINPROGRESS (Operation now in progress) ... socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 48 connect(48, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("213.236.208.94")}, 16) = -1 EINPROGRESS (Operation now in progress) ... socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 55 connect(55, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("199.7.71.190")}, 16) = -1 EINPROGRESS (Operation now in progress)
In that output, you can see it communicating with the following IP addresses: 91.203.99.55, 213.236.208.94, and 199.7.71.190. To learn more about the communication, I used wireshark:
From this, we now know:
-
The HTTP protocol parser must be tested (unsurprising for a web browser, but in another program this might be important).
-
The application uses a certificate revocation list, so we need to test the parsing of it.
Additionally, in the full strace output, we can see that this application also opens and reads from several files, performs DNS lookups, etc. In other words, a complex real-life program has many, many input sources.