Research and development

I am a software engineer at Human Longevity, Inc., a genomics and cell therapy-based diagnostic and therapeutic company. Using advances in genomic sequencing, the human microbiome, proteomics, informatics, computing, and cell therapy technologies, HLI is building the s most comprehensive database on human genotypes and phenotypes to tackle the diseases associated with aging-related human biological decline. Before joining HLI, I re-engineered a human mutation mapping pipeline SaaS system at Illumina, Inc., a developer and manufacturer of life science tools and integrated systems for the analysis of genetic variation and biological function.

From 2003 until mid-2013, I was a Research Staff Member at the IBM Almaden Research Center, where I most recently led the development of the IBM Neuro Synaptic Core Simulator (NSCS) as part of the IBM SyNAPSE project. NSCS models a reconfigurable cortical hardware circuit capable of capturing the various cognitive abilities of the brain, and is intended to evaluate the expected behavior of neuronal algorithms, such as image processing algorithms, when deployed on hardware implementations. Evaluations performed with NSCS demonstrated the potential and power of neuronal algorithms in advance of hardware implementations, thus enabling efficient research and development within this new problem-solving domain.

Prior to NSCS development, I was the research technical leader for the IBM Virtual Mission Bus (VMB) project. The VMB was a middleware system for supporting distributed, adaptive, hard real-time applications for a dynamic cluster of satellites, under the aegis of the DARPA System F6 program. I led a combined research and development team that designed and implemented the VMB, and that produced a successful technology demonstration of the VMB.

In general, my technical interests include: high-performance system simulations, lightweight distributed consistency control, secure group membership protocols, and algorithms for automatic resource reservation and management. Projects I am working on (or have worked on), in descending chronological order, include:

Trait prediction using whole-genome sequencing data
The IBM SyNAPSE Cognitive Computing project for DARPA SyNAPSE
The Virtual Mission Bus middleware system, as part of the Pleiades architecture for DARPA System F6
End-to-end performance management for large distributed storage
Self-managing heterogeneous storage systems
Decentralized recovery for survivable storage systems (Doctoral thesis research at Carnegie Mellon University)
Exclusive caching in hierarchical storage systems

You will need Adobe Reader to read the Portable Document Format (PDF) files.

Current projects

Trait prediction using whole-genome sequencing data

Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.

Publications include:

Christoph Lippert et al. Identification of individuals by trait prediction using whole-genome sequencing data. In Proceedings of the National Academy of Sciences (PNAS), September 2017 (early edition)

Prevous work

The IBM SyNAPSE Cognitive Computing project for DARPA SyNAPSE

Cognitive Computing is an emerging field with a goal to develop a coherent, unified, universal mechanism to engineer the mind. Cognitive computing seeks to implement a unified computational theory of the mind, taking advantage of the ability of the brain to integrate ambiguous sensory information, form spatiotemporal associations and abstract concepts, and make decisions and initiate sophisticated coordinated actions.

Our approach to cognitive computing is to develop dedicated hardware systems for implementing a canonical cortical circuit that can achieve tremendous gains in power and space efficiency when compared to traditional von Neumann circuits. Such efficiency is crucial when scaling such circuits to the size of a mammalian cortex. Our cortical circuit is a reconfigurable network of spiking neurons that is composed of neuron processing elements connected through synapse memory elements—both akin to the basic building blocks of the brain.

To validate and verify the configuration of our hardware, we have developed a simulator that can reproduce hardware functional behavior when testing circuits at the size of a mammalian cortex. Such a simulator also doubles as a research tool for developing and testing new cognitive computing algorithms for implementation on the hardware.

Publications include:

Arnon Amir, Pallab Datta, William P. Risk, Andrew S. Cassidy, Jeffrey A. Kusnitz, Steven K. Esser, Alexander Andreopoulos, Theodore M. Wong, Myron Flickner, Rodrigo Alvarez, Emmett McQuinn, Ben Shaw, Norm Pass, and Dharmendra S. Modha. Cognitive Computing Programming Paradigm: A Corelet Language for Composing Networks of Neurosynaptic Cores. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN 2013), August 2013
Andrew S. Cassidy, Paul Merolla, John V. Arthur, Steve K. Esser, Bryan Jackson, Rodrigo Alvarez-Icaza, Pallab Datta, Jun Sawada, Theodore M. Wong, Vitaly Feldman, Arnon Amir, Daniel Ben-Dayan Rubin, Filipp Akopyan, Emmett McQuinn, William P. Risk and Dharmendra S. Modha. Cognitive Computing Building Block: A Versatile and Efficient Digital Neuron Model for Neurosynaptic Cores. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN 2013), August 2013
Steve K. Esser, Alexander Andreopoulos, Rathinakumar Appuswamy, Pallab Datta, Davis Barch, Arnon Amir, John Arthur, Andrew Cassidy, Myron Flickner, Paul Merolla, Shyamal Chandra, Nicola Basilico, Stefano Carpin, Tom Zimmerman, Frank Zee, Rodrigo Alvarez-Icaza, Jeffrey A. Kusnitz, Theodore M. Wong, William P. Risk, Emmett McQuinn, Tapan K. Nayak, Raghavendra Singh, and Dharmendra S. Modha. Cognitive Computing Systems: Algorithms and Applications for Networks of Neurosynaptic Cores. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN 2013), August 2013
Theodore M. Wong, Robert Preissl, Pallab Datta, Myron Flickner, Raghavendra Singh, Steven K. Esser, Emmett McQuinn, Rathinakumar Appuswamy, William P. Risk, Horst D. Simon, Dharmendra S. Modha. 10¹⁴. IBM Technical Paper RJ10502, November 2012

Acrobat PDF (221 KB)
Robert Preissl, Theodore M. Wong, Pallab Data, Raghav Singh, Steven Esser, William Risk, Horst Simon, Myron Flickner, and Dharmendra Modha. Compass: A Scalable Simulator for an Architecture for Cognitive Computing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC 2012), November 2012

For more information, please visit:

IBM – SyNAPSE: a cognitive computing project from IBM Research

The Virtual Mission Bus middleware system, as part of the Pleiades architecture for DARPA System F6

Distributed, adaptive, hard real-time applications, such as process control or guidance systems, have requirements that go beyond those of traditional real-time systems: accommodation of a dynamic set of applications, autonomous adaptation as application requirements and system resources change, and security between applications from different organizations. Developers need a middleware with features that support developing and running these applications, especially as commercial and defense systems become more network-centric. The Virtual Mission Bus (VMB) middleware, targeted at both distributed IT systems and real-time systems, provides the essential basic services to support these applications and the tools for building more complex services, all while keeping the middleware kernel minimal enough for embedded system use. We successfully used the VMB to prototype a distributed spacecraft cluster system.

Publications include:

Theodore M. Wong, Richard A. Golding, Harvey M. Ruback, Wilfred Plouffe, and Scott A. Brandt. The Virtual Mission Bus. IBM Technical Paper RJ10472, September 2010

Acrobat PDF (257 KB)
David M. LoBosco, Glen E. Cameron, Richard A. Golding, and Theodore M. Wong. The Pleiades fractionated space system architecture and the future of national security space. In Proceedings of the AIAA SPACE 2008 Conference, September 2008

Acrobat PDF (992 KB)

End-to-end performance management for large distributed storage

Storage systems for large and distributed clusters of compute servers are themselves large and distributed. Their complexity and scale makes it hard to manage these systems, and in particular they make it hard to ensure that applications using them get good, predictable performance. At the same time, shared access to the system from multiple applications, users, and competition from internal system activities leads to a need for predictable performance.

The storage quality-of-service project at the UCSC Storage Systems Research Center investigates mechanisms for improving storage system performance in large distributed storage systems through mechanisms that integrate the performance aspects of the path that I/O operations take through the system, from the application interface on the compute server, through the network, to the storage servers. We focus on five parts of the I/O path in a distributed storage system: I/O scheduling at the storage server, storage server cache management, client-to-server network flow control, client-to-server connection management, and client cache management.

Publications include:

Tim Kaldewey, Theodore M. Wong, Richard Golding, Anna Povzner, Scott Brandt, and Carlos Maltzahn. Virtualizing disk performance. In Proceedings of the 14th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2008), April 2008 (Best student paper)

Acrobat PDF (256 KB)
Anna Povzner, Tim Kaldewey, Scott Brandt, Richard Golding, Theodore M. Wong, and Carlos Maltzahn. Efficient guaranteed disk request scheduling with Fahrrad. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 (EuroSys 2008), April 2008

Acrobat PDF (400 KB)
David O. Bigelow, Suresh Iyer, Tim Kaldewey, Roberto C. Pineiro, Anna Povzner, Scott A. Brandt, Richard A. Golding, Theodore M. Wong, and Carlos Maltzahn. End-to-end performance management for scalable distributed storage. In Proceedings of the Petascale Data Storage Workshop, November 2007

Acrobat PDF (130 KB)
Theodore M. Wong, Richard A. Golding, Caixue Lin, and Ralph A. Becker-Szendy. Zygaria: Storage performance as a managed resource. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), April 2006

Acrobat PDF (424 KB)

Patents include:

Ralph A. Becker-Szendy, Richard A. Golding, Caixue Lin, Theodore M. Wong, and Omer A. Zaki. System and method for managing storage system performance as a resource. United States Patent 7,962,563 (granted 14 June 2011)

Self-managing heterogeneous storage systems

The growth in the amount of data being stored and manipulated for commercial, scientific, and intelligence applications is worsening the manageability and reliability of data storage systems. The expansion of such large-scale storage systems into petabyte capacities puts pressure on cost, leading to systems built out of many cheap but relatively unreliable commodity storage servers. These systems are expensive and difficult to manage—current figures show that management and operation costs are often several times purchase cost—partly because of the number of components to configure and monitor, and partly because system management actions often have unexpected, system-wide side effects. Also, these systems are vulnerable to attack because they have many entry points, and because there are no mechanisms to contain the effects either of attacks or of subsystem failures.

Kybos is a distributed storage system that addresses these issues. It will provide manageable, available, reliable, and secure storage for large data collections, including data that is distributed over multiple geographical sites. Kybos is self-managing, which reduces the cost of administration by eliminating complex management operations and simplifying the model by which administrators configure and monitor the system. Kybos stores data redundantly across multiple commodity storage servers, so that the failure of any one server does not compromise data. Finally, Kybos is built as a loosely coupled federation of servers, so that the compromise or failure of some servers will not impede remaining servers from continuing to take collective action toward system goals.

Our primary application is the self-management of federated (but potentially unreliable) clusters of storage servers, but we anticipate that the algorithms we have developed (and will implement) will have broad applicability to the general class of problems involving the coordination of independent autonomous agents with a collective set of mission goals.

Publications include:

Richard A. Golding and Theodore M. Wong. Walking toward moving goalposts: agile management for evolving systems. In Proceedings of the First Workshop on Hot Topics in Autonomic Computing (HotAC I), June 2006

Acrobat PDF (136 KB)
Winfried W. Wilcke et al. IBM Intelligent Bricks project—Petabytes and beyond. IBM Journal of Research and Development, 50(2/3), pp. 181–198, March–May 2006
Theodore M. Wong, Richard A. Golding, Joseph S. Glider, Elizabeth Borowsky, Ralph A. Becker-Szendy, Claudio Fleiner, Deepak R. Kenchammana-Hosekote, and Omer A. Zaki. Kybos: Self-management for distributed brick-based storage. IBM Technical Paper RJ10356, August 2005

Acrobat PDF (208 KB)

Patents include:

Richard A. Golding, Theodore M. Wong, and Omer A. Zaki. Computer program and method for managing resources in a distributed storage system. United States Patent 7,694,082 (granted 6 April 2010)

Decentralized recovery for survivable storage systems

Modern society has produced a wealth of data to preserve for the long term. Some data we keep for cultural benefit, in order to make it available to future generations, while other data we keep because of legal imperatives. One way to preserve such data is to store it using survivable storage systems. Survivable storage is distinct from reliable storage in that it tolerates confidentiality failures in which unauthorized users compromise component storage servers, as well as crash failures of servers. Thus, a survivable storage system can guarantee both the availability and the confidentiality of stored data.

Research into survivable storage systems investigates the use of m-of-n threshold sharing schemes to distribute data to servers, in which each server receives a share of the data. Any m shares can be used to reconstruct the data, but any m - 1 shares reveal no information about the data. The central thesis of this dissertation is that to truly preserve data for the long term, a system that uses threshold schemes must incorporate recovery protocols able to overcome server failures, adapt to changing availability or confidentiality requirements, and operate in a decentralized manner.

To support the thesis, I present the design and experimental performance analysis of a verifiable secret redistribution protocol for threshold sharing schemes. The protocol redistributes shares of data from old to new, possibly disjoint, sets of servers, such that new shares generated by redistribution cannot be combined with old shares to reconstruct the original data. The protocol is decentralized, and does not require intermediate reconstruction of the data; thus, it does not introduce a central point of failure or risk the exposure of the data during execution. The protocol incorporates a verification capability that enables new servers to confirm that their shares can be used to reconstruct the original data.

Publications include:

Theodore M. Wong. Decentralized recovery for survivable storage systems. PhD dissertation (Technical Report CMU-CS-04-119), School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, May 2004

Acrobat PDF (714 KB) PostScript (1642 KB)
Theodore M. Wong, Chenxi Wang, and Jeannette M. Wing. Verifiable secret redistribution for archive systems. In Proceedings of the First International IEEE Security in Storage Workshop (SISW 2002), December 2002

Acrobat PDF (424 KB)
Theodore M. Wong and Jeannette M. Wing. Verifiable secret redistribution. Technical Report CMU-CS-01-155, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, October 2001

Acrobat PDF (176 KB) PostScript (204 KB)

Thesis committee:

Jeannette Wing
Greg Ganger
Chenxi Wang
Michael Reiter, Dept. of Electrical and Computer Engineering, Carnegie Mellon University

Exclusive caching in hierarchical storage systems

[I began this research project while interning with the Storage Systems Program at Hewlett-Packard Labs.]

Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are inclusive. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive.

We explore the benefits of a simple scheme to achieve exclusive caching, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a DEMOTE operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful (sometimes substantial) speedups.

During our investigation, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.

Publications include:

Theodore M. Wong and John Wilkes. My cache or yours? Making storage more exclusive. In Proceedings of the USENIX Annual Technical Conference, June 2002, pp. 161–175

Acrobat PDF (264 KB)

Patents include:

John Wilkes and Theodore M. Wong. Exclusive caching in computer systems. United States Patent 6,851,024 (granted 1 February 2005)
John Wilkes and Theodore M. Wong. Adaptive data insertion for caching. United States Patent 6,728,837 (granted 27 April 2004)

Last modified: 05 March 2022 21:03 EST