Software Reliability for Concurrent and Distributed Systems

Today’s software is evolving in the direction of more concurrency and decentralization. With the increasing use of mobile devices and cloud services, the applications we use today are deployed to geo-replicated distributed systems, easily accessible from everywhere. However, the increased complexity of the software systems makes it more difficult to reason about possible behaviors of a system and to produce correct software.

It is challenging to implement distributed systems correctly since their behavior is more complicated than classical sequential programs. The nondeterminism in the delivery order of concurrent messages, network failures, or node crashes may result in subtle executions that lead to buggy behavior. It is difficult for the programmers to consider all possible executions during the system design and implementation. The reliability of distributed systems requires different techniques than those designed for sequential software.

This research line aims to build program analysis, testing, and debugging methods for concurrent programs and distributed systems. Our research interests span a broad spectrum of concurrent programs: multi-threaded, asynchronous, event-driven, and distributed systems.

We aim to build software analysis and testing methods for including (but not limited to):

Decentralized consensus systems and blockchains
Distributed systems with weak consistency and weak isolation
Distributed systems with microservice architecture
Shared-memory multicore programs

Recent Awards:

“Distinguished paper award” for our paper “Randomized Testing of Byzantine Fault Tolerant Algorithms” at OOPSLA’23.
“Stellar Academic Research Grant” for the research proposal “Feedback-guided fault-injection testing of blockchain systems” from the Stellar Development Foundation.
Ripple Bug Bounty Program Award with the bug our recent work discovered in the XRP Ledger of Ripple. Levin Winter’s contribution to the bug fix is acknowledged in the release notes of XRP Ledger version 1.10.0.
“Amazon Research Award” for the research proposal “Coverage-directed randomized testing of distributed systems” in Fall 2022.

CS4405 - Analysis of Concurrent and Distributed Programs (course page)
IN4315 - Software Architecture (course page)

Contact if you are interested in working on software testing, program analysis, concurrent programming, distributed systems, and blockchains.

Software Reliability for Concurrent and Distributed Systems

Recent Awards:

Related MSc Courses: