Towards Learning Input Structure

Speaker: external page Caroline Lemieux
Time: Feb 25, 15:00 - 16:00 (CEST)
Place: virtual
Talk title: Towards Learning Input Structure

Download Talk slides (PDF, 2.8 MB)

Abstract: For programs that expect highly-structured inputs, fuzzing techniques that are input-structure agnostic (e.g., AFL, libFuzzer) have a hard time generating inputs that exercise more than just the parsing code of these programs. If input structures for a program under test could be learned automatically, input-structure aware fuzzing methods could be better leveraged. This talk covers two works that provide stepping stones towards input structure learning. First, we will cover Arvada, which, given a set of example inputs and an oracle of valid inputs, aims to learn a context-free grammar which maximally generalizes those example inputs. Then, we will discuss RLCheck, which, given an over-approximate generator and an oracle of valid inputs (the program under test), uses reinforcement learning to try and "tune" the generator to produce more valid inputs. The talk will highlight both the innovations and points of improvement for both these works.

Additional links:
external page Arvada (Generating input Grammars with ML)
external page RLCheck (Fuzzing with RL)