SHARD: Cloud-Computing for a Scalable Semantic Web

September 16, 2010
2:50 pm - 4:00 pm
Halligan 111
Speaker: Kurt Rohloff, BBN
Host: Rob Jacob

Abstract

This talk provides an overview of cloud computing in general and focuses on SHARD. SHARD is a cloud computing technology I designed and implemented that provides high-performance, highly robust and highly scalable graph data storage and manipulation capabilities. SHARD addresses one of the major limitations of the Semantic Web by enabling the hosting and querying of very large graphs of data using the SPARQL language. The Semantic Web is supposed to provide a WWW-scale information sharing model and platform. Unfortunately, Semantic Web data processing technologies have, until now, been deployed on a single (or a small number of) machine(s) at a time. These previous methodologies create horrible data processing/coordination bottlenecks and contradicts the fundamentally Web-scale Semantic Web vision. These previous performance bottlenecks are probably some of the reasons there hasn't been a broader uptake of Semantic Web technologies. SHARD addresses these limitations with a distributed computing architecture built on the map-reduce formalism to enable order-of-magnitude performance improvements in Semantic Web data processing using COTS hardware. I present experimental results from deploying SHARD on low-cost Amazon clouds and measuring performance using standard graph querying benchmarks.