Saturday, July 31, 2010

DDR3 Fly-by-topology and Write-leveling

The key is that the DQ/DQS signals are directly connected to each rank (or set of chips). So where the controls (clock, command, address, etc...) are connected in fly-by topology, the data and strobes are connected directly. This is why write-leveling is important. It indicates the skew between when the clock (and other control signals) arrive(s) and when the data (and strobes) arrive(s).

Also this is a nice explanation of rank and other stuff:

Thursday, July 29, 2010

SystemVerilog - Interfaces vs Classes

So I've been designing Verilog for a while and I recently started playing with SystemVerilog. As a programmer with C++ experience, I'm very familiar with OOP (Object Oriented Programming). But this didn't give me the complete understanding of SystemVerilog's paradigm. Here's the key differences in ideas between Interfaces and Classes.

Interfaces are like pointers to the actual signals. Interfaces live in the same realm as modules. Just like modules must be instantiated globally and not within a procedural block, so to interfaces must be instantiated globally. If you have a UART module (which contains the low level UART RX (receiver) and TX (transmitter)), and you have 3 different sets of UART wires (for 3 different UART ports). You may also have a UARTTest class which takes a UART and runs some tests on it. How do you build this?

Keep in mind that you don't want to always have to pass signals along. So in theory I want to create a UARTTest instance and have that instance store inside of it the signals required to control an individual UART.

You can't pass module instances around... This is where Interfaces come in. Here's an attempt without interfaces:

reg uart_tx [2:0]; //3 separate uart tx lines for 3 separate uarts
reg uart_rx [2:0]; //3 separate uart rx lines for 3 separate uarts
//not bothering to declare the rest of the signals... or anything else

UART uart_inst(.tx(uart_tx[0]), .rx(uart_rx[0]), ...);
UART uart_inst(.tx(uart_tx[1]), .rx(uart_rx[1]), ...);
UART uart_inst(.tx(uart_tx[2]), .rx(uart_rx[2]), ...);

initial begin
UARTTest t0;
UARTTest t1;
UARTTest t2;
t0 = new;
t0.runTest(uart_tx[0], uart_rx[0], ...);
t1 = new;
t1.runTest(uart_tx[1], uart_rx[1], ...);
t2 = new;
t2.runTest(uart_tx[2], uart_rx[2], ...);

The reason I have to send each signal above is because I have no way to store the signals in the class. Of course the main problems here are the verbosity of such calls. (I'm pretty sure what I've written above will work (with ref of course)... but even that's kind of unclear... I'll have to test this later.) Anyhow, the nicer and more proper way to do this is:

Create an interface with an rx and tx signal, and then instantiate them.

//signals are not declared at the top level, they're stuck in the interface.
uart_interface uart_interf_inst0 [2:0];

UART uart_inst(.tx(uart_interf_inst[0].tx), .rx(uart_interf_inst[0].rx), ...);
UART uart_inst(.tx(uart_interf_inst[1].tx), .rx(uart_interf_inst[1].rx), ...);
UART uart_inst(.tx(uart_interf_inst[2].tx), .rx(uart_interf_inst[2].rx), ...);

initial begin
UARTTest t0;
UARTTest t1;
UARTTest t2;
t0 = new(uart_interf_inst[0]);
t1 = new(uart_interf_inst[1]);
t2 = new(uart_interf_inst[2]);

Now the interface is stored in the class, and can be used. Since it's an interface, it actually controls real signals to the UART module. Remember to use as the parameter to new the virtual keyword as in:

function new(
virtual uart_interface interf

virtual for an interface implies something like a pointer to an interface. So it ends up being a pointer to the structure that contains all the pointers to the signals. Exactly what you need.

Monday, July 26, 2010

Help!!! - DDR3 ODT (On Die Termination)???

Gosh, figuring this stuff out is difficult. Anyone know how this actually works? Why is Rtt Nom variable? When is it in use? If Dynamic ODT is on then Rtt Nom is really only in use when nothing is going across the wires. RTT Wr is used during a write, and Rtt is disabled during a read. So what is Rtt Nom really used for? It can't be for when Dynamic ODT is off, because that still wouldn't explain why there are 2 settings. Is it needed to keep the wires clean and not bouncing when nothing is going across it? Perhaps, but then why are there 6 possible values? What am I missing????

I can't continue until I completely understand this stuff. Perhaps I need to take some analog classes, or read some analog for dummies books, and then this would become clear... Although I doubt it.

Saddened by my lack of understanding here.

It makes me so happy to learn something new, and that happened in spades tonight! Well at least I learned a bunch of analog stuff, and got a great explanation for Rtt Nom.

The key is that when there are multiple DDR3 chips hooked up, and you are only writing to one of them, then Rtt Wr could be used for the one you are writing to, but the other one which shares the same DQ/DQS/DM lines needs to be terminated with Rtt Nom which must be calibrated because it will affect the signal integrity of the data being written. This is the explanation. I can't claim complete understanding or comprehension since I've not got near enough analog knowledge or experience, but for my simplistic understanding, this suffices very well!

So much happier now :)

Friday, July 23, 2010

modelsim scripting - tcl

Spent some time scripting with Modelsim, and here are the lessons:

quietly / transcript off | on | quiet ...

Modelsim by default echoes all set commands to the console. This is b/c of Modelsim's transcript setting. You can either call 'quietly set ...' or just set 'transcript off' at the top. echo will stop working, but you shouldn't use that anyhow, use 'puts ...' in place of 'echo ...' for compatibility across operating systems.

onerror {resume} ... doesn't actually do everything you want it to...

When I use a script with nested procs, and loops, and far down in the stack I call 'eval vlog ...', and the vlog fails, Modelsim will NOT continue from where it failed. This is pretty useless to me as I don't need it to continue further on, but right there. So one file failed vlog, continue compiling the rest! The solution is to 'catch' the error. Wrap 'eval vlog' in a catch. Something like 'catch [eval vlog ...]'. You can also use onerror to perform any error level logging and stuff, but this way it will still continue right where you were.

argv doesn't exist

As most people know, and as is written up on the web a lot, Modelsim hijacks argv. They give you argc and all the arguments appear as %1 ... %n where n is argc. argv is set to '-gui', and that's it.

Wednesday, July 21, 2010

Altera - Ease of use leader? Not on your life!

I've used Altera a few times in the last 10 years. Wrote a few FPGAs, used Quartus, SignalTap, and Megawizard. But I've never had the chance to really compare Xilinx and Altera. Now I've actually had the opportunity to do a side by side comparison of Xilinx's and Altera's PCI Express cores.

Here it is: I was comparing Xilinx and Altera's PCIe root complexes to see which one simulates faster. First challenge was to create the cores, and learn how to instantiate them. I used Virtex 6 and Stratix IV as the FPGAs to instantiate for. I used the vendor provided cores, and not the 3rd party ones. I started with Xilinx.

Xilinx's instantiation wizards are a bit ugly, but they create the core well enough. For Xilinx I started by checking if there were any compile time defines I needed, and it looked like I didn't need anything special. So I found the top level file for the core, and started to manually instantiate it in a basic top_tb.v file. I then started to compile just to see what I'd need to get the simulation moving. There were a bunch (if not all) of the files in one directory which I needed, and I also needed the secureip and unisims libraries. Once I got the simulation to load, I went in to my top_tb, and added in the reference clock as needed. I then saw the tx p/n ports start wiggling as expected. For Xilinx it was easy to see that I needed a parameter to put it into simulation mode - a mode which shortens certain processes in the physical level training algorithm. At this point, I connected my endpoint and saw the link come up very quickly.

Now on to Altera.

Altera's instantiation wizards are much nicer. They also create the core as expected. I began in the same way as Xilinx by going through and checking if I needed special compile time defines. Like Xilinx, there weren't any. What I did notice which surprised me was the use of defparams, but that's not a problem b/c it's inside their code and doesn't affect me. I now took the top level module and began instantiating it in a new top_tb.v. Altera's core required me to compile altera_mf, sgate, 200model, stratixiv_hssi_atoms, stratixiv_pcie_hip, and some others which I can't recall at the moment. I also had to compile a number of files that were in the core's tree. Once I got all the modules needed to load the simulation, I was able to run. Altera's core loads up with a large number of mis-matched port connections. Altera's core didn't bother to connect many output ports in many places (presumably b/c they weren't using them). They also connected some ports with incorrect sizes. This fills the Modelsim console with a bunch of warnings. At this point I added the clock to top_tb.v, and set the lowest bit of test_in to 1 as appropriate for simulations. I expected to quickly see tx p/n to start wiggling. I saw the tx p/n pins wiggle after a very long time. I didn't bother to connect the endpoint.

Now for the important stuff:

Altera's simulations are slow as molasses!

Altera's PCIe core runs many times slower than Xilinx's or Lattice's. I have run all three and I was shocked. Truth is at first I thought Altera's core wasn't working until I realized that I had to give it more time. That is how slow it was. At least 2x slower if not closer to 4x slower than either of the other 2 vendors that I've worked with.

Altera's core is confusing and unintuitive!

I was trying to find the signal that says that the transaction layer is ready. Something like Lattice's dl_up, and Xilinx's trn_dl_up. There is no such thing in Altera's core. To see that Altera's core is up you must look at the ltssm status. The core provides multiple clocks that you need to set, many different resets, a test_in and test_out array, and in general lots of signals that are shockingly unclear.

Altera's port list is very old style, and separates the ports based on input and output!

The top level port list is old Verilog style. I don't have any problem with that of course. What is disconcerting is that the list has 2 sections, inputs and outputs. This is completely unhelpful. Xilinx's core has the ports separated by function: Common, TX, RX, CFG ... This is how I would have liked Altera to do it. Perhaps there's a reason for this, but it seems to follow the same paradigm of providing a very unfriendly RTL.

Altera's provided simulation is a mess!

I tried to understand the paradigm of Altera's PCIe core by looking at the simulation, and I can only use the word appalled. The method they use to reset the simulation, is some form of counter at which point they start things running. I was looking for this to find Altera's recommended method of seeing if the transaction layer is up. There was nothing to find.

I finally gave up on Altera, not because I couldn't get it working, but because it was not nearly worth the effort. Their core is a huge mess. I'm sure it works well enough in synthesis, but I wouldn't bother trying to run it in simulation. Even when using their Modelsim script to run simulation, you get the Modelsim warnings of unconnected ports. They should be embarrassed to release something like this. If only their IP team that provides these cores could learn from the GUI team. Very nice UI, horrible RTL.

I'm sure I could go on and on, but suffice it to say that for the advanced user who likes to read the code created, Xilinx is much better. Also for anyone who desires to simulate efficiently, Xilinx is much better. I have never been so shocked by a vendor before, especially one that is purported to be the most user friendly of the vendors. Both Lattice and Xilinx win against Altera.

As a side-note relating to Altera, it seems these types of issues have existed long before their PCI Express core. I have spoken with some very long-time Altera users, and it is known that their FIFOs simulate horrifically slow, and their old PCI core user-side was a mess too. Perhaps its time for Altera to heed this call and take the necessary steps to remedy this situation.


Sunday, July 18, 2010

Modelsim, working with multiple libraries - bug - not finding work

In short,

if you have the work library and another library named test, and you have:


and top instantiated test_mod which instantiates work_mod, you will get an error at vsim time that modelsim can't find work_mod. This is due to how Modelsim determines the work library when instantiating the test_mod. The typical command you would use is:
vsim -L test - this produces an error
vsim -L work -L test - also an error
The only way to get around this issue for now is to use:
vsim -L ./work -L test

What is occurring from what I can tell is that Modelsim treats the work library as special, and in this way creates all sorts of buggy behavior. Another recommendation from Model Tech is to not use the work library and therefore you would have a command like:
vsim -L mywork -L test
which would work fine. This is a very untenable solution for all the obvious reasons, so for now I'm sticking with the first workaround.

Error from Modelsim:

# vsim -L test
# Loading
# Loading test.test_mod
# ** Error: (vsim-3033) test_module.v(2): Instantiation of 'work_mod' failed. The design unit was not found.
# Region: /top/test_mod_inst
# Searched libraries:
# C:\tmp\modelsim_test\test (<-This is actually Modelsim mixing up the work library)
# C:\tmp\modelsim_test\test
# Error loading design

Example files:

module work();
_library library_inst();
module work_sub();

module _library();
work_sub work_sub_inst();
vlib work
vlog work.v
vlib library
vlog -work library library.v
vsim -L library work

#vsim -L ./work -L library work - this will compile and run as noted above.

Friday, July 2, 2010

Simulating 2 Lattice PCIe cores...

I start off every post with some vocal exclamation, so gah!

(All this is from memory, so forgive any unintentional mistakes.)

Lattice IP Express allows you to create PCIe cores. For compilation, you are given 2 NGO files, and For simulation you are given an obfuscated behavioral model of PCIe and the sources for the pcs NGO. (Why do they compile those sources to I guess it's to save the user from complications, and to save them from using the define files discussed next.) The simulation files are all controlled through files with various `defines in them. These `defines control the compilation of both the beh and pcs files. This is a major problem in Modelsim!

First off, you run into the issue that you can't have 2 cores with different simulation `defines running in the same design. For example I have a DUT (Device Under Test) with Wishbone and a BFM (Bus Functional Model) without it. This can't work b/c the same module names are used in both instances, even when you gave the cores different top level names during IP Express creation!

Next issue is that you can try to change the module names, but the behavioral model has a large amount of tiny modules with obfuscated names and with no easy way to go over the file and rename them all. You can do it, but it would take a nice sed script or some other method, and it would be annoying... The other modules of course could be renamed much easier. Next problem is that the defines are of course the same names, so you'd have to rename those, or compile the files from a top level .v file so the defines stay local at least with the -sv switch. All sorts of headaches involved.

Create separate libraries for the simulation core and the DUT core. Use Modelsim's trick to get vsim to work properly... What is that trick you ask? By setting the first search library as work in the vsim command, Modelsim will perform a search for modules beginning with the library that the module is instantiated inside of.
This won't work:
vsim top_tb -L pcie_sim_work -L pcie_endpoint_work

This will work:
vsim top_tb -L work -L pcie_sim_work -L pcie_endpoint_work

(Note: Lattice requires pcsd_mti_work also... so add -L pcsd_mti_work after mapping the library)

Modelsim discusses this special case in their reference manual under "Handling sub-modules with common names":
"When you specify -L work first in the search library arguments you are directing
vsim to search for the instantiated module or UDP in the library that contains the module
that does the instantiation."