Garp: Combining a Processor with a Reconfigurable Computing Array
Today's general-purpose processors are highly optimized for executing
complex sequences of instructions for a popular set of basic operations.
Almost by definition, these processors are good at executing average
application workloads. Nevertheless, many algorithms contain critical
``kernels'' whose performance significantly impacts total application
performance, and which are unlikely to be perfectly implemented by any
Reconfigurable hardware may be better at supporting many such kernels for
First, reconfigurable hardware is better at implementing functions that
happen not to map well to the ``standard'' set of operations.
And second, control flow can be hard-coded within the reconfigurable array
logic, sidestepping instruction bandwidth bottlenecks and thus providing
more potential to exploit parallelism.
The focus of the Garp research is the integration of a reconfigurable
computing unit with an ordinary RISC processor to form a single combined
The goal of the research is to demonstrate a tentative viable architecture
that gives a speedup for at least some applications.
Thumbnail sketch of the first Garp implementation:
RISC core is a single-issue MIPS-II.
Gate array is organized as 32 rows by 23 columns of 2-bit logic blocks.
A 24th column of control blocks manages communication outside the array.
Each logic block takes as many as four 2-bit inputs and produces up to two
Each row can be configured to perform any 4-input logical function, a
3-input addition/difference, or a variable shift, on up to 46 bits of data.
Each logic block includes four bits of data state, totaling to 92 bits per
Partial configuration of the array is possible in row increments.
Reconfiguration time from external memory is 12 external bus cycles per row
plus some amount of startup time.
A transparent integrated configuration cache holds the equivalent of 128
total rows of configurations (distributed as 4 cached configuration rows for
each physical row).
Reconfiguration time from the integrated cache is 4 cycles (independent of
the number of rows).
Up to 128 bits per cycle memory bandwidth to/from any 4 rows in the array.
Up to 64 bits per cycle from the MIPS core register file to any 2 rows in
the array, and up to 32 bits per cycle from any array row back to the MIPS
core register file.
Reconfigurable array can perform data cache or memory accesses independent
of the MIPS core.
Control synchronization between the MIPS core and the
reconfigurable array is flexible and efficient.
Virtual memory, supervisor mode,
and protected execution of multiple processes is supported.
Additional sources of information
``Garp: A MIPS Processor with a Reconfigurable Coprocessor,''
by John R. Hauser and John Wawrzynek,
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing
(FCCM '97, April 16-18, 1997).
``The Garp Architecture,''
(complete specification) by John R. Hauser.
``Augmenting a Microprocessor with Reconfigurable Hardware.
'' John Reid Hauser, Ph.D. Thesis, December 2000.
C Compilation for the Garp Chip:
Featuring predicated and speculative execution;
automatic fully-pipelined loop execution
(even with multiple and/or data-dependent exits);
automatic utilization of streaming memory queues,
plus redundant memory access elimination,
utilizing SUIF dependence library;
full support of arbitrary pointer accesses
from the coprocessor; and intelligent
hyperblock formation using profiling information.