Streaming Data from HDD to GPUs for Sustained Peak Performance
 
  
  
    In the context of the genome-wide association studies (GWAS), one has to solve long sequences of generalized least-squares problems; such a task has two limiting factors: execution time --often in the range of days or weeks-- and data management --data sets in the order of Terabytes. We present an algorithm that obviates both issues. By pipelining the computation, and thanks to a sophisticated transfer strategy, we stream data from hard disk to main memory to GPUs and achieve sustained peak performance; with respect to a highly-optimized CPU implementation, our algorithm shows a speedup of 2.6x. Moreover, the approach lends itself to multiple GPUs and attains almost perfect scalability. When using 4 GPUs, we observe speedups of 9x over the aforementioned implementation, and 488x over a widespread biology library.
@inproceedings{Beyer2013GWAS,
  author    = {Lucas Beyer and Paolo Bientinesi},
  title     = {Streaming Data from HDD to GPUs for Sustained Peak Performance},
  booktitle = {Euro-Par},
  publisher = {Springer},
  series    = {Lecture Notes in Computer Science},
  volume    = {8097},
  pages     = {788-799},
  year      = {2013},
  isbn      = {3642400477},
  ee        = {http://arxiv.org/abs/1302.4332},
}
 
        