FlowMiner

Abstract

FlowMiner is a tool for automatically mining expressive, fine-grained data-flow summaries from Java library bytecode. FlowMiner captures enough information to enable context, type, field, object and flow-sensitive partial program analysis of applications using the library. FlowMiner’s summaries are compact- flow details of a library that are non-critical for future partial program analysis of applications are elided into simple edges between elements that are accuracy-critical. Hence, summaries extracted by FlowMiner are an order of magnitude smaller than the original library. We present (i) novel algorithms to extract expressive, fine-grained, compact summary data-flows from a Java library, (ii) graph summarization paradigm that uses a multi-attributed directed graph as the mathematical abstraction to represent summaries, (iii) open-source implementation (FlowMiner tool) of the above that saves summaries in a portable format usable by existing analysis tools, and (iv) experiments with recent versions of Android showing that FlowMiner significantly advances the state-of-the-art tooling in accuracy.

Contributions

  • A static analysis technique to automatically generate fine-grained, expressive summary specifications given the source or bytecode of any Java library.
    • Our algorithms identify and retain key artifacts of the program semantics necessary to allow context, object, flow, field, and type-sensitive data-flow analyses in the future when using our summaries.
    • Our summaries use a rich, multi-attributed graph as the mathematical abstraction to encode fine-grained summaries, rather than coarse binary relations between the inputs and outputs of library API.
    • The generated summaries are compact and significantly smaller than the original library, as non-key features in the flows of the original library are elided into key paths.
  • Open-source reference implementation  of our algorithms that extracts summaries given the source or bytecode of a library and exports them to a portable, tool-agnostic format.
  • Experimental evaluation of FlowMiner demonstrating that FlowMiner’s summaries of recent versions of Android are much smaller than the original programs, yet more expressive and accurate than other state-of-the-art summary techniques.

 

Experimental Results

flowminer-results

Resources

Paper (PDF): FlowMiner-ICISS2015.pdf

Slides (PDF): FlowMiner-ICISS2015-slides.pdf

Tool: http://powerofpi.github.io/FlowMiner/

Source Code: https://github.com/powerofpi/FlowMiner

Bibtex:

@incollection{
year={2015},
booktitle={Proceedings of the International Conference on Information Systems Security},
volume={9478},
series={Lecture Notes in Computer Science},
editor={Jajodia, Sushil and Mazumdar, Chandan},
title={FlowMiner: Automatic Summarization of Library Data-Flow for Malware Analysis},
publisher={Springer International Publishing},
author={Tom Deering and Ganesh Ram Santhanam and Suresh Kothari},
language={English}
}