garp: Berkeley Reconfigurable Architectures, Systems, and Software

Garp: Combining a Processor with a Reconfigurable Computing Array

Today's general-purpose processors are highly optimized for executing complex sequences of instructions for a popular set of basic operations. Almost by definition, these processors are good at executing average application workloads. Nevertheless, many algorithms contain critical ``kernels'' whose performance significantly impacts total application performance, and which are unlikely to be perfectly implemented by any processor. Reconfigurable hardware may be better at supporting many such kernels for two reasons: First, reconfigurable hardware is better at implementing functions that happen not to map well to the ``standard'' set of operations. And second, control flow can be hard-coded within the reconfigurable array logic, sidestepping instruction bandwidth bottlenecks and thus providing more potential to exploit parallelism.

The focus of the Garp research is the integration of a reconfigurable computing unit with an ordinary RISC processor to form a single combined processor chip. The goal of the research is to demonstrate a tentative viable architecture that gives a speedup for at least some applications.

Thumbnail sketch of the first Garp implementation:

RISC core is a single-issue MIPS-II.
Gate array is organized as 32 rows by 23 columns of 2-bit logic blocks. A 24th column of control blocks manages communication outside the array.
Each logic block takes as many as four 2-bit inputs and produces up to two 2-bit outputs.
Each row can be configured to perform any 4-input logical function, a 3-input addition/difference, or a variable shift, on up to 46 bits of data.
Each logic block includes four bits of data state, totaling to 92 bits per row.
Partial configuration of the array is possible in row increments.
Reconfiguration time from external memory is 12 external bus cycles per row plus some amount of startup time.
A transparent integrated configuration cache holds the equivalent of 128 total rows of configurations (distributed as 4 cached configuration rows for each physical row).
Reconfiguration time from the integrated cache is 4 cycles (independent of the number of rows).
Up to 128 bits per cycle memory bandwidth to/from any 4 rows in the array.
Up to 64 bits per cycle from the MIPS core register file to any 2 rows in the array, and up to 32 bits per cycle from any array row back to the MIPS core register file.
Reconfigurable array can perform data cache or memory accesses independent of the MIPS core.
Control synchronization between the MIPS core and the reconfigurable array is flexible and efficient.
Virtual memory, supervisor mode, and protected execution of multiple processes is supported.

Additional sources of information

``Garp: A MIPS Processor with a Reconfigurable Coprocessor,'' by John R. Hauser and John Wawrzynek, published in Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM '97, April 16-18, 1997).

``The Garp Architecture,'' (complete specification) by John R. Hauser.

``Augmenting a Microprocessor with Reconfigurable Hardware. '' John Reid Hauser, Ph.D. Thesis, December 2000.

Related project

Automatic C Compilation for the Garp Chip: Featuring predicated and speculative execution; automatic fully-pipelined loop execution (even with multiple and/or data-dependent exits); automatic utilization of streaming memory queues, plus redundant memory access elimination, utilizing SUIF dependence library; full support of arbitrary pointer accesses from the coprocessor; and intelligent hyperblock formation using profiling information.

[BRASS Home] [Projects] [Class] [Documents] [People] [Contact] [Sponsors] [Links]