BRASS Research Group

SCORE: Stream Computations Organized for Reconfigurable Execution

Invited talk by Eylon Caspi, Randy Huang, Yury Markovskiy, Joseph Yeh, John Wawrzynek, and André DeHon.
Presented at the 2002 System on Chip Seminar (SoC 2002), November 20-21, 2002, Tampere, Finland.

A key challenge for a system with programmable cores, such as processors and reconfigurable fabrics, is its software programming. System level programming typically involves sequencing computational tasks on the cores and mapping intermediate results to memory buffers, based on architectural details such as the number of cores, memory size, and interconnect bandwidth. The mapping is typically done manually by a programmer, with several difficulties: (1) it is laborious, (2) it binds the software to a particular architecture, undermining its portability and scalability, and (3) it makes architectural exploration for SoC design difficult, since remapping the software for each architecture variant is a manual process.

The SCORE effort (Stream Computations Organized for Reconfigurable Execution) addresses these issues by introducing a streaming communications discipline both at the hardware (core to core) and software (process to process) level. Software is modeled as a network of stream-connected processes, where a stream is a unidirectional, buffered FIFO channel, and a process is a software context controlling a core (this is a Kahn process network). While bus communication requires all processes to be resident, stream communication does not, thanks to buffering. Hence a process network can express a large number of processes but still operate correctly if only a subset of them is resident in hardware. Capturing the communication structure of all processes in one network enables automatic, efficient scheduling of processes onto cores, and automatic mapping of communication buffers into memory. This not only removes burden from the programmer, but also allows software to be remapped to a different architecture, having a different number of programmable cores and different bus architectures. Application performance will scale with the architecture, needing no modification to the software or cores. This scheme enables performance estimation during architectural exploration of the SoC, as well as forward compatibility to future architectures.

In this talk I will present key concepts of the streaming abstraction and its application to a chip architecture with reconfigurable fabric clusters, memories, and a processor. I will discuss techniques for automatic scheduling and will demonstrate their use in mapping a collection of media applications to architecture variants of different sizes. This talk represents a collective effort from the BRASS (Berkeley Architectures, Systems, and Software) group at U.C. Berkeley.