Real-time analysis of dependency networks
At the Software Analytics Lab (SAL), we are developing techniques to construct
precise and fine-grained dependency networks of package repositories such as
crates.io using methods from program analysis. Typically,
we build dependency networks from dependency descriptors in package metadata
files such as
Cargo.toml, yielding an imprecise
representation as it does not account for how and what portion of dependencies
in a single package are actually being used in the source code. Recently, we
have developed a systematic approach to creating call-based dependency
networks (CDNs) by inferring the dependency use at the function call level of
packages. Such a representation makes it possible for the first time to perform
analysis such as precise security vulnerability tracking, software license
tracking and data-based API evolution studies on a dependency network. Our
first evaluation of building a CDN for crates.io has shown
promising results and we are now looking for interested master students to
explore new avenues with this work!
To make analyses of a CDN up-to-date and actionable, we need to process events of package repositories in real-time. Many package repositories offer a live feed tracking of changes made to their repository such as npm’s registry follower. The objective of this project is to tap into such feeds and create a real-time pipeline of fetching a new release, building a call graph and integrating it into a CDN. While it may sound trivial to add a new release to a CDN, aspects such as dependency resolution of existing nodes need to be handled. This also includes native support for retrospective analysis of the evolution of a CDN.
 J. Hejderup, A. van Deursen, and G. Gousios, “Software Ecosystem Call Graph for Dependency Management,” in Proceedings of the 40th International Conference on Software Engineering: New Ideas and Emerging Results, New York, NY, USA, 2018, pp. 101–104.