I’m currently taking a few months off between jobs, so I decided to
indulge in some recreational computer engineering. After making some good
progress on a set of macOS device drivers for Xilinx’s AMD’s PCI
Express DMA IP, I turned my attention to my FPGA hardware. My
NiteFury has started refusing to bring up more than one lane
of PCIe at a time, and I needed a replacement with more bandwidth to do
performance testing under load.
Unfortunately, there aren’t many high-performance FPGA boards with hobbyist-level pricing on the market today. The NiteFury and ButterStick used to sell for under $200, but they’re both out of production. The ZuBoard 1CG is only $159, but the programmable logic side is kind of anemic, and the only high-speed transceivers it exposes are connected directly to the ARM core and limited to 5 Gbps. To get a proper PCI Express connector on your board, you need to upgrade to the AuBoard 15P for $699, or the AXKU5 for $1043 plus a 55% tariff. And both of those use the expensive FMC connector for expansion cards instead of the cheaper, more compact SYZYGY connector.
“Surely we can do better,” I thought. The AXKU5’s FPGA (the biggest and fastest one you can use with the gratis version of Vivado) costs less than $100 online. How hard and expensive could high-speed board design possibly be, even in KiCad?
Oh, sparks, you sweet summer child.
But, after a couple months of on-and-off work, I have a board design! I sent it off to the lowest bidder this week for fabrication, and the first two prototypes should come back to me later this summer. Total cost to me: about $450-$500 per unit including shipping and tariffs but conspicuously not including the cost of my time since I have my “hobbyist” hat on instead of my “professional” hat.
Anyway, here it is!
The hardware design files are available on my SourceHut, licensed under the permissive version of the CERN Open Hardware License.
Once I have boards in hand, I’ll write up my bringup experiences here. If all goes well, I’ll write some gateware demos and see about getting it picked up by CrowdSupply or GroupGets.
The Good
- Full-height, half-length PCI Express form factor (111mm x 168mm)
- AMD Kintex UltraScale+ XCKU5P-2FFVB676I (474k logic cells)
- Up to 40W FPGA power draw with recommended Radian ST12LE fan/heatsink
- PCI Express Gen3 x8 connector (Gen4 capable with custom IP)
- 260-pin SODIMM slot for up to 16GiB DDR4-2400 or 32GiB DDR4-2133
- zQSFP+ (QSFP28) cage for up to 100 Gigabit Ethernet
- SYZYGY TXR4 port with 4 28Gbps lanes and 14 high-density I/O
- SYZYGY STD port with 32 high-performance I/O as 16 length-matched diff pairs
- SYZYGY STD port with 32 high-density I/O
- 128 Mbit QSPI Flash with programming header
- Standard 14-pin Xilinx JTAG header
- Economical board design: 8 layers, no via-in-pad, single-sided assembly
The Bad
- SI/PI? YOLO! No field solver and only the most basic impedance calculator
- Long (100-150 mm) high-speed traces to the SYZYGY TXR4 port
- No real provision for operating without a host PC
The Ugly
Basically, this project was only feasible because I was unemployed and bored enough to spend several weeks manually routing traces. Well, and also because AMD publishes a detailed PCB design guide with conservative rules for DDR4 routing. I’ve been careful enough (I think) about following those rules that I can almost believe that this board won’t fall over and cry when I ask it to go fast. I’ll be happy to get 15Gbps from the transceivers without an off-board redriver, ecstatic if I can pull the full rated 28Gbps.
Top 5 pros of using KiCad
- Not paying thousands of dollars to Cadence, Siemens, Autodesk, or Altium
- Getting to put the little open-hardware gear logo on your board
- Documented file formats that you can use
awk
andsed
on - Python bindings that are only sort-of deprecated
- Schematic capture is mostly good enough
Top 5 cons of using KiCad
- “Free” only if you value engineering labor at zero
- Interactive routing that keeps offsetting your tracks by 1 micron
- No ability to set separate DRCs for BGA escape
- Trace delays and impedances don’t update live
- As mentioned above, SI/PI? YOLO!