The Parallel and Data Intensive Applications group is involved in collaborative efforts with Notre Dame faculty and staff and provides technical support, advising and consulting services in areas including benchmarking hardware and software.
To cluster ever-increasingly massive data sets into groups of objects that have preferentially close values on different, possibly overlapping, subsets of attributes, a parallel version of the clustering objects on subsets of attributes (COSA) algorithm has been developed based on the message passing paradigm. The speedup and the scaleup of the proposed algorithm approach the optimal as the number of objects increases. On typical test data sets, nearly linear speedups are observed, for example, 28.59 on 32 processors, and essentially linear scaleup in the size of the data set and in the number of clusters desired. An open source implementation of the parallel COSA algorithm is also made publicly available.
Ceph consists of (at least) one metadata server, one monitoring server and one storage server. For redundancy and increased performance, multiple storage servers and metadata servers can be dynamically added to the system as the storage needs grow. The system re-balances (or replicates) the metadata and storage load automatically to take advantage of new nodes or usage pattern changes.