PROTOS Protocol Genome Project
Computer programs are a part of our daily lives. The programs we use process information from various sources using a pleathora of encodings and protocols. The input processing routines are among the most exposed areas of a program, which is why they should be especially reliable. This is rarely the case. Our previous experiments with protocol implementations have shown that well designed structural mutations are an efficient way to expose errors in networked programs. A similar approach can be applied to any type of content by using automatic structure inference techniques. It is our belief that that automatic structure inference combined with domain specific reasoning capabilities will enable, among other things, efficient and creative automated program robustness testing tools.
- The PROTOS Protocol Genome Project was initiated in January of 2004. The original motivation was to enable testing of implementations of arbitrary, possibly unknown, protocols by inferring a model of the packet structure of a protocol from examples, and using the model to generate testing material for implementations of the protocol. The guiding principle of the project is to build an automatic program robustness testing system based on model assisted fuzzing. Structure of valid inputs for a program is first analyzed by a program to understand what kind of data the program is known to process. This knowledge is then used to build similar data, with various kinds of mutations. The program is then subjected to these mutated inputs in a controlled enviroment, and the cases that cause the program to fail are collected. We currently consider each program as a stateless protocol. Main areas of the project are structure inference and instrumentation. Our current structure inference tools are based on two of our previous prototypes. Although the tools and testing framework are preliminary versions, we have already found a large number of interesting program failures.
- We are currently trying to make some of the techniques from the earlier phase of the project easily accessible to vendors. Even fairly simple versions of the model inference idea allows one to easily build tools for finding weaknesses in many currently used programs. The current situation, where the tools are mostly private or commercial, is not beneficial for a wide class of vendors or independent software developers. Many vendors have sophisticated in-house white-box tools for testing, and crackers have a selection of ad-hoc fuzzers which are often hard to use and tailored for a particular software or type of data. One of our current activities is to write a tool called Radamsa, which should make automatic robustness testing an easier drop-in addition to product development.
- In order to evaluate the effectiveness of the robustness testing tools in practice, we periodically run tests to see how well they perform against real-world software. The current test are mainly aimed at commonly used important open source programs. We are trying to come up with a lightweight way to give detailed and repeatable reports about issues in programs in a mostly automated manner, while also spreading the awareness of the ease with which developers could themselves have found the issues using the same tools. Hopefully this will cause at least a few bugs to be patched before they are targeted by worms and crackers.
- Other activities include learning and implementing more sophisticated techniques and applying them to different domains. Allowing testing of networked programs with Radamsa and writing a good black-box plagiarism detection program are also among the short term goals.
* Radamsa -- a test-case generator for robustness testing
* Platypus -- a black-box plagiarism detector
Experiences with Model Inference Assisted Fuzzing, WOOT '08, Joachim Viide, Aki Helin, Marko Laakso, Pekka Pietikäinen, Mika Seppänen, Kimmo Halunen, Rauli Puuperä, and Juha Röning, University of Oulu, Finland
A. Helin, J. Viide, M. Laakso, J. Röning. Model Inference Guided Random Testing of Programs with Complex Input Domains. 2006. [pdf]
Domain model based black box fuzzing using regular languages, Master's thesis, Rauli Puuperä, 2010
- In case you are interested in this kind of work, know of a related domain in which similar ideas could be useful etc feel free to contact us. The preferred way of contacting the project personnel is through our collective OUSPG mailing list. Please remove SPAMLESS before delivery.