mapper: Berkeley Reconfigurable Architectures, Systems, and Software

Rapid Datapath-Oriented FPGA Mapping

Our group is investigating the use of reconfigurable coprocessors based on FPGA (field-programmable gate array) technology to accelerate computational tasks (see the page describing the Garp Chip). The computation kernels to be executed on the FPGA coprocessor are either automatically extracted from a high-level language such as C (see Automatic C Compilation for Garp) or are manually specified using a hardware description language or schematic capture.

One serious problem when synthesizing logic for an FPGA coprocessor is the time required to partition, place, and route the part of the computation performed on FPGA resources. Vendors' tools are optimized to handle random logic and can take hours to complete. Notably, they do not take proper advantage of the structure inherent in regular datapath operations. Instead, they take one of two approaches:

Flatten the entire datapath to primitive gates, then perform logic optimization and technology mapping. Often the regularity of the input is not exploited, and thus the same optimization is repeated for each bit slice of the datapath. The subsequent place and route task often initially randomizes the layout, losing any remaining regularity. Not surprisingly, this approach is very expensive computationally.
Map each individual operation to a predefined "hard macro" function unit. This is quick and preserves regularity that simplifies layout and routing, but with no optimization performed across macro boundaries, computation resources are often underutilized [1].

Our approach, based on the same linear-time tree covering algorithm used by instruction selection in compilers, merges multiple operations into optimized modules while preserving the preferred regular bit-slice layout of the resulting datapath. This approach is very fast; a typical kernel extracted during C compilation, approximately 50k-100k gate equivalents, is synthesized in a second or less.

Because of the importance of routing delays in FPGAs, the algorithms perform module selection and placement in one integrated step.

Other characteristics of the Garp chip's reconfigurable array add interesting twists to the mapping problem. For example, the algorithms must consider the clock period to be fixed, since the reconfigurable array is clocked synchronously with the microprocessor core. Also, general memory accesses can be performed directly from the gate array.

This research is described in more detail in a paper that appeared at FPGA'98.

References

[1] ``Module Compaction in FPGA-based Regular Datapaths'', Andreas Koch, 33rd Design Automation Conference, 6/96, Las Vegas, NV, USA

For more information, contact Tim Callahan, timothyc at cs.berkeley.edu.

RANDOM DETAILS:
Ideally, each intermediate mapping step would retain all points that are not "clearly suboptimal" (equal or worse in both delay and area than another equivalent mapping point). Then each production would consider all combinations of points of its children. But because of the complex interactions of mapping and placement, it is not clear if even this approach could claim an exact solution. So for simplicity and speed, only one "best" solution is kept for each unique grammar point---either best delay, with area as a tie-breaker, or vice versa.

[BRASS Home] [Projects] [Class] [Documents] [People] [Contact] [Sponsors] [Links]