Once again I find myself reading and tirelessly paging through Xilinx documentation in order to understand how to properly implement a DSP48E block. Of course before I did this I just wrote my code and let the tools figure out what to do. Now I desired to instantiate the block myself and perhaps to get some added value by doing this. I can happily report that I've done it, and lowered the FF (Flip Flop) and LUT (Look Up Table) usage by a significant amount! Here are a few tips that might help you get started:
On the Virtex 5 chips you have columns of DSP48E tiles. A tile is 2 DSP48E slices arranged vertically. A slice is a single DSP48E block. The V5 syntax for location constraints (LOC) is DSP48E_XcYr where c is the column and r is the row. Each Virtex 5 chip can have a different number of DSP48E columns. The DSP48E's counting is not related to the typical SLICE columns or rows, they are separately counted. Bottom left DSP48E is DSP48_X0Y0, and top right DSP48E for the SX95T is DSP48_X9Y63. This equates to 640 DSP48E slices (in 320 DSP48E tiles).
A DSP48E has a lot (emphasized) of functionality. Refer to ug193.pdf from Xilinx for detailed descriptions.
The embedded registers in the DSP48E and its ability to change its operation on a clock-by-clock basis block save lots fabric FFs and LUTs. A lot of functionality that would typically be taken out of the DSP48E block can be kept inside by using its registers and different modes of operation.
Another function which is very nice is the PCIN/PCOUT. A lower DSP48E in a tile can transfer it's output, without going out to the fabric, to the higher DSP48E in the same tile for a joint calculation. This calculation is then saved from being done on the fabric.
A few caveats:
PCIN/PCOUT must be connected via a wire bus of the FULL 48 bit width. The tools will give an error if you attempt to connect only a part of the bus. This is of course completely logical, but a more descriptive error and explanation would be nice. I'm sure this applies the same to all other silicon interconnected buses between the DSP48E blocks for the same reasons. Once PCIN and PCOUT are connected, and of course only between 2 DSP48E blocks as these buses are direct between 2 adjacent DSP48E blocks, the tools will attempt to place them properly such that the connection is valid. This means that if the tools cannot find a single tile to place these two DSP48E blocks into and in the correct order then it will fail at Map. You can force the location of DSP48E blocks using the LOC constraint, or the relative location using the RLOC constraint. U_SET is useful if you are trying to use RLOC and want that constraint to be relative to only a specific group of DSP48E blocks.
Thumbs up to Xilinx for some excellent DSP blocks in the Virtex 5!
PS - Be aware of 2 errors in the Virtex 5 HDL Documenation:
The port is not CEMULTCARRY-IN but rather CEMULTCARRYIN.
The string value is not "NO_PAT_DET" but rather "NO_PATDET". - This error currently only comes out at the Map stage so will only be caught after the long Synthesis and Translate steps.
I've had Xilinx create 2 CRs to fix the documentation errors and the error reporting issue relating to this.