My research interests lie in the general area of “Computer Systems”. More specifically, I am interested in software techniques for tackling the challenges brought on by distribution, scale and failures. Here is a partial list of the topics I have worked on during my career, in roughly chronological order. For each topic, I have linked a few publications that are most representative of my work in that area.
Operating Systems. During my PhD work at UC Berkeley, I was one of the architects of “BSD Unix” which was a major factor in the rapid growth of the Internet through its builtin TCP/IP stack and which has influenced numerous other modern operating systems including Linux, Mac OS/X and iOS.
Performance Evaluation and Modeling. My work on BSD Unix included implementation and evaluation of the virtual memory subsystem.
Distributed Computing. I have worked on several problems in distributed computing including global state detection and atomic commitment.
Byzantine Agreement. My students and I have studied the role of communication network topologies on this fundamental problem and have made contributions to both lower-bound results and algorithms.
Parallel Computing on Networks of Workstations. My work on the Paralex system for turning a collection of workstations on a local-area network into a “supercomputer” defined a brand new line of research that later became known as “Networks of Workstations (NOW)”.
Group Communication Systems. My work on group communication in partitionable systems introduced the notion of an “overlay network” that has become a fundamental abstraction for building distributed systems.
Peer-to-Peer Systems. My group has contributed to peer-to-peer computing through paradigms, algorithms, frameworks (Anthill) and a widely-used open source simulation software package (PeerSim). This topic was responsible for kindling my interest in complex systems and bio-inspired computing.
Autonomic Computing and Self-Management. I was an early advocate of new “grassroots” and “data-driven” approaches to autonomic computing that achieve self-* properties through emergence and machine learning rather than self-awareness and explicit programming.
Gossip-Based Techniques in Distributed Systems and Overlay Networks. My group has developed gossip-based algorithms for efficiently solving several important problems including aggregation, topology management, shape formation and loose synchronization in very large distributed systems.
Biology and Nature-Inspired Computing. As part of our work on the BISON Project, we developed a library of “design patterns” for distributed computing that draw inspiration from biological or natural processes.
Game Theoretic Techniques in Peer-to-Peer Systems. I was among the first to apply ideas drawn from evolutionary game theory to problems of cooperation and selfishness in peer-to-peer computing.
Cloud Computing. I have proposed architectures for peer-to-peer cloud computing and algorithms for server consolidation that leverage our experience with peer-to-peer systems and gossip-based primitives in very large distributed systems. The resulting architectures not only lower the cost of entry to cloud computing but also lower energy consumption.
High-Performance Computing. We have been applying machine learning techniques to building predictive models for failures, fault classification, job dispatching and power consumption in data centers and HPC systems.