PARA 2008 Tutorials

Workshop on State-of-the-Art in Scientific and Parallel Computing

May 13-16, NTNU, Trondheim, Norway

PARA'08 logo

TUTORIALS -- MAY 13, 20008

Ben Allan photo Tony Drummond  Jaideep Ray Photo Rouson Photo Muszala Tutorial on Rapid, Sustainable Development for Complex HPC Software
  • (FULL Day)

    Organizer: Benjamin Allan (Sandia National Lab, USA)

    Additional Lecturers:

    • Tony Drummond (Lawrence Berkeley National Laboratory, USA)
    • Jaideep Ray (Sandia National Laboratories, USA)
    • Damian Rouson (Sandia National Laboratories, USA)
    • Stefan Muszala (Tech-X Corporation, USA)
    Please check out this webpage for lots more info. re. thsi tutorial.
  • Fred Gustavson Photo of Jerzy Wasniewski Tutorial on New Data Structures for Dense Linear Algebra
  • Half-day -- morning

    Fred Gustavson (*) and Jerzy Wasniewski (**)

    • (*)IBM T.J. Watson Research Center, New York, USA, and Univ. of Umeċ, Sweden (Adjunct)
    • (**) Technical University of Danmark
  • Slusallek photo Tutorial on GPU Programming for Scientific Computing
  • Half-day -- morning

    Philipp Slusallek(*)

    • Prof. Dr.-Ing. Computer Graphics Lab, Saarland University, Saarbrücken, Germany
    • Director DFKI (German Research Center for Artificial Intelligence)
    • Recently, Visiting Professor at NVIDIA Research
    Modern GPUs provide a level of massively parallel computation that was once the preserve of a few high-end supercomputers like the MasPar and Connection Machine. NVIDIA's Tesla architecture for GPU Computing provides a fully programmable, massively multithreaded hardware with up to 128 scalar processor cores on a single chip and capable of delivering hundreds of billions of operations per second. Researchers across many scientific and engineering disciplines are using this platform to accelerate a wide variety of computations by up to 2 orders of magnitude.

    In this tutorial, we will provide an overview of the hardware architecture and explore the transition it represents in massively parallel computing: from the domain of an expensive supercomputer to that of a commodity "manycore" chip -- one sits just next to the CPU in most PCs anyway.

    We will also introduce CUDA, a scalable parallel programming model and software environment for parallel programming. By providing a small set of readily understood extensions to the C/C++ languages, CUDA allows programmers to focus on writing efficient parallel algorithms without the burden of learning a multitude of new programming constructs.

    Finally, as the GPU is the only widely available commodity "manycore" chip available today, we will explore its importance as a research platform for exploring important issues in parallel programming and architecture.

  • Olaf O. Storaasli Photo Sven Holmquist Photo Magnus Peterson Photo Sandro Stucki Photo Tutorial on FPGA Programming for Scientific Computing
  • Half-Day -- afternoon

    Olaf O. Storaasli (*) , Sven Holmquist (**) Magnus Peterson (**) Sandro Stucki (***)

    • (*) Oak Ridge National Lab, USA
    • (**) Synective, Sweden
    • (***) Mitrionics, Sweden
    Accelerator technology has become a hot topic over the last several years. By delivering application speed-up of 10-100 times compared to ordinary CPUs, these devices are now attractive as "desktop supercomputers" and drastically reduce node counts for larger supercomputer systems. FPGA-based accelerators are versatile and offer significant acceleration over a wide range of applications. Today, adding FPGA accelerators to servers is straight-forward, off the shelf technology offered by many vendors, with an extensive eco-system of supporting hardware and software.

    The tutorial, given by leading developers and users, describes the basics of FPGA hardware and software, how FPGAs accelerate HPC applications, how to program them using both high and low level languages and what different HW and SW solutions are available. Key examples from several different fields (bio-tech, genome-sequencing, molecular dynamics, geoscience, weather/climate prediction/modeling, financial analysis & double-precision solution of matrix equations), and take a close up look at the dramatic effect acceleration can have on HPC systems, including reduced cost, power, size and cooling requirements. In addition to live demos, the opportunity exists before, during and after the Workship to get "hands-on" experience on one of the most advanced FPGA-based systems planned for use in future Cray XT5h supercomputers.

  • Elster Photo Jensen Photo Tutorial on Optimization Techniques for Scientific Codes
  • Half-Day -- afternoon

    Anne C. Elster and Rune E. Jensen

    • (*) Oak Ridge National Lab, USA
    Any application programmer that wants to take advantage of HPC hardware, should optimize their code well before parallelizing it. In this tutorial we will first discuss available optimized libraries and basic optimization techniques. Calling an autotuned library such as FFTW or Atlas can give you several factors of speed-up over non-optimized handwritten code.

    In the second half, we will show how Rune went about beating an autotunable Basic Subroutine Library using several more intricate optimization techniques in C.


  • Contact Information:

    Please mail comments re. this page to:

    E-mail: para08-at-idi.ntnu.no (replace -at- with @)

    It was last updated on Feb 20, 2008. Comments welcome.