There's a bunch of pages describing how to write a First Word Fall Through FIFO, see http://www.billauer.co.il/reg_fifo.html
But Neither of these deal with how to write ] an FWFT FIFO on top of a FIFO that uses Xilinx's output registers. Why is this different?
In a simple one clock read_enable->data_out scheme, you can pre buffer the first data out, and then use the first read acknowledge to trigger the next read enable. This gives you back to back results out of the FWFT FIFO.
In a two clock read_enable->data_out scheme, you must wait an additional clock for your data, and that means that an FWFT implementation must perform 2 read enables without waiting for read acknowledges. You run the risk of overwriting the first read data with the second data. This makes it a bit tricky to convert such a FIFO to FWFT.
And here's my solution:
Use clock enables.
Sounds simple, and it is. Using the clock enables allows you to read data out of the FIFO, and then control the FIFO logic to stop reading. Why not just use the read_enables?
1. Can't stop read enables mid fetch without clock enables. Must stop second read enable from overwriting output data of the first read enable.
2. Timing. My FWFT logic still makes 250 MHz + timing on a Virtex 6 and Virtex 7. (My Vivado project uses default synthesis and implementation options.)
This is not as trivial as it sounds. You must carefully control both stages of clock enables. My own FIFO implementation can optionally implement arbitrarily wide and deep RAMB18/36E1 blocks to create it's FIFO memory. Since I support this, I also make sure to support the full range of clock enables going into the RAMB primitives. Xilinx was very intelligent in the RAMB design, and b/c of their foresight, we get to control each stage of output separately. As long as you can control both stages of data properly, you can implement an FWFT with an almost trivial amount of ease.