https://secureyourpassword.com
http://code.google.com/p/secureyourpassword/
Passwords are notoriously insecure. Secure Your Password changes that.
How your password is exposed
1. Logging into a non-secure website from a public WiFi network
2. All website which you log into have access to your password
3. 3rd party password security sites keep your password
What Secure Your Password does for you.
1. Never stores or receives your password, your password never leaves your computer
2. Creates a salt (random word) which obfuscates your password on a website by website basis
3. Stores the salt on a secure (https) server, and retrieves that salt whenever you want to access the website
4. Does all of this transparently after the initial creation of the password
Sunday, December 27, 2009
Sunday, December 20, 2009
libpst and readpst... the hacked way on vista / xp - fixed
So have you ever backed up your PSTs and then needed a good way to look at them. Well I have. I backed up my PSTs, and without Outlook I was a bit stuck. After checking around I found that the only free option is libpst. libpst has an executable called readpst which can open up the folders and parse your files. Problem is that libpst is written for Linux. Nobody packages a Windows friendly readpst.
I looked around to see what should work, and I didn't want to pollute my newly installed Vista with all sorts of stuff. Here are the steps I took to make readpst compile without too many installations:
libpst:
libpst-0.6.45.tar.gz
MinGW:
Install MinGW (MinGW-5.1.6.exe), choose GCC only.
GnuWin32 tools:
Install GNU Regex for Win32 (regex-2.7-setup.exe).
Install GNU ICONV for Win32 (libiconv-1.9.2-1.exe).
Now you can compile readpst, here's the command:
From the libpst src directory run:
"c:\MinGW\bin\mingw32-gcc.exe" -DHAVE_REGEX_H -I"c:\Program Files (x86)\GnuWin32\include" -include "C:\MinGW\include\inttypes.h" -DVERSION=\"0.6.45\" -include "C:\MinGW\include\errno.h" -include "C:\MinGW\include\sys\stat.h" -include "C:\MinGW\include\limits.h" -DHAVE_ICONV -DICONV_CONST=const -L"c:\Program Files (x86)\GnuWin32\lib" debug.c libpst.c vbuf.c -liconv libstrfunc.c timeconv.c lzfu.c readpst.c -lregex -o readpst.exe
Modify your path to include 'C:\Program Files (x86)\GnuWin32\bin\;'. This is how the required DLLs will be found. Thank you to everyone who pointed out that it was missing libiconv2.dll. Since I use the GnuWin32 tools all the time I naturally had this set in my path before I had ever heard of libpst.
If you are running on a 32 bit OS then you will have to change the 'Program Files (x86)' to 'Program Files', or if you install 64 bit versions you'll have to change stuff around, and so on and so on.
If anyone has some good pointers on an easier way to compile readpst I'd love to hear it, and no I'm not interested in a full MSYS or cygwin install!
Link to readpst x32 : https://www.dropbox.com/s/gfeyp1tn5yuv7rk/readpst.exe?dl=0
I looked around to see what should work, and I didn't want to pollute my newly installed Vista with all sorts of stuff. Here are the steps I took to make readpst compile without too many installations:
libpst:
libpst-0.6.45.tar.gz
MinGW:
Install MinGW (MinGW-5.1.6.exe), choose GCC only.
GnuWin32 tools:
Install GNU Regex for Win32 (regex-2.7-setup.exe).
Install GNU ICONV for Win32 (libiconv-1.9.2-1.exe).
Now you can compile readpst, here's the command:
From the libpst src directory run:
"c:\MinGW\bin\mingw32-gcc.exe" -DHAVE_REGEX_H -I"c:\Program Files (x86)\GnuWin32\include" -include "C:\MinGW\include\inttypes.h" -DVERSION=\"0.6.45\" -include "C:\MinGW\include\errno.h" -include "C:\MinGW\include\sys\stat.h" -include "C:\MinGW\include\limits.h" -DHAVE_ICONV -DICONV_CONST=const -L"c:\Program Files (x86)\GnuWin32\lib" debug.c libpst.c vbuf.c -liconv libstrfunc.c timeconv.c lzfu.c readpst.c -lregex -o readpst.exe
Modify your path to include 'C:\Program Files (x86)\GnuWin32\bin\;'. This is how the required DLLs will be found. Thank you to everyone who pointed out that it was missing libiconv2.dll. Since I use the GnuWin32 tools all the time I naturally had this set in my path before I had ever heard of libpst.
If you are running on a 32 bit OS then you will have to change the 'Program Files (x86)' to 'Program Files', or if you install 64 bit versions you'll have to change stuff around, and so on and so on.
If anyone has some good pointers on an easier way to compile readpst I'd love to hear it, and no I'm not interested in a full MSYS or cygwin install!
Link to readpst x32 : https://www.dropbox.com/s/gfeyp1tn5yuv7rk/readpst.exe?dl=0
Saturday, December 19, 2009
Registry editing - powerful and very dangerous - made easier
I started playing around in the Windows Registry on Windows 98 and it was a time when registry editing was much easier. Sure it didn't have the robustness of an XP or Vista registry, but there also wasn't all the hassle of permissions everywhere. With the onset of this permission madness the steps it takes to delete a key can be daunting. Take ownership, replace permissions, take ownership and replace permissions of newly appearing subfolders one by one... It's a pain. The solution:
Registrar Registry Manager! http://www.resplendence.com/, there is a lite version that can be downloaded for free.
Permissions don't matter anymore! This way you can still edit the registry as if you're the administrator of the system, and I mean real administrator (like root), not just wimpy Windows cut-down version of the administrator.
Enjoy,
Registrar Registry Manager! http://www.resplendence.com/, there is a lite version that can be downloaded for free.
Permissions don't matter anymore! This way you can still edit the registry as if you're the administrator of the system, and I mean real administrator (like root), not just wimpy Windows cut-down version of the administrator.
Enjoy,
Monday, November 23, 2009
godaddy.com linux hosting and converting url to lowercase...
So I had an interesting experience today with my website nkcorner.com. I noticed that when running my website on my Windows system the URL is case insensitive. On GoDaddy's server the URL is case sensitive since it runs on Linux.
I started digging around for how to fix this. GoDaddy doesn't load the mod_spelling module or create a RewriteMap for tolower (as far as I know). This means that I can't use the CheckSpelling On fix or the RewriteMap tolower function.
Before I continue - Please correct me if I'm wrong because what I'm about to show is a really awkward workaround...
So I had to write 26 lines of letter by letter conversion. This is what it looks like in my .htaccess file:
RewriteCond %{REQUEST_URI} !([A-Z]+)
RewriteRule .* - [S=27]
RewriteRule ^(.*)A(.*)$ http://%{HTTP_HOST}/$1a$2 [R=301,L]
RewriteRule ^(.*)B(.*)$ http://%{HTTP_HOST}/$1b$2 [R=301,L]
RewriteRule ^(.*)C(.*)$ http://%{HTTP_HOST}/$1c$2 [R=301,L]
RewriteRule ^(.*)D(.*)$ http://%{HTTP_HOST}/$1d$2 [R=301,L]
RewriteRule ^(.*)E(.*)$ http://%{HTTP_HOST}/$1e$2 [R=301,L]
RewriteRule ^(.*)F(.*)$ http://%{HTTP_HOST}/$1f$2 [R=301,L]
RewriteRule ^(.*)G(.*)$ http://%{HTTP_HOST}/$1g$2 [R=301,L]
RewriteRule ^(.*)H(.*)$ http://%{HTTP_HOST}/$1h$2 [R=301,L]
RewriteRule ^(.*)I(.*)$ http://%{HTTP_HOST}/$1i$2 [R=301,L]
RewriteRule ^(.*)J(.*)$ http://%{HTTP_HOST}/$1j$2 [R=301,L]
RewriteRule ^(.*)K(.*)$ http://%{HTTP_HOST}/$1k$2 [R=301,L]
RewriteRule ^(.*)L(.*)$ http://%{HTTP_HOST}/$1l$2 [R=301,L]
RewriteRule ^(.*)M(.*)$ http://%{HTTP_HOST}/$1m$2 [R=301,L]
RewriteRule ^(.*)N(.*)$ http://%{HTTP_HOST}/$1n$2 [R=301,L]
RewriteRule ^(.*)O(.*)$ http://%{HTTP_HOST}/$1o$2 [R=301,L]
RewriteRule ^(.*)P(.*)$ http://%{HTTP_HOST}/$1p$2 [R=301,L]
RewriteRule ^(.*)Q(.*)$ http://%{HTTP_HOST}/$1q$2 [R=301,L]
RewriteRule ^(.*)R(.*)$ http://%{HTTP_HOST}/$1r$2 [R=301,L]
RewriteRule ^(.*)S(.*)$ http://%{HTTP_HOST}/$1s$2 [R=301,L]
RewriteRule ^(.*)T(.*)$ http://%{HTTP_HOST}/$1t$2 [R=301,L]
RewriteRule ^(.*)U(.*)$ http://%{HTTP_HOST}/$1u$2 [R=301,L]
RewriteRule ^(.*)V(.*)$ http://%{HTTP_HOST}/$1v$2 [R=301,L]
RewriteRule ^(.*)W(.*)$ http://%{HTTP_HOST}/$1w$2 [R=301,L]
RewriteRule ^(.*)X(.*)$ http://%{HTTP_HOST}/$1x$2 [R=301,L]
RewriteRule ^(.*)Y(.*)$ http://%{HTTP_HOST}/$1y$2 [R=301,L]
RewriteRule ^(.*)Z(.*)$ http://%{HTTP_HOST}/$1z$2 [R=301,L]
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1 [R=301,L]
I wish I could get this to work without the redirects, but so far I haven't found any way to get mod_rewrite to not re-append the sub-directories. Ie if you try this with a path like http://nkcorner.com/RacingLive/RacingLive.php, it will keep putting the %PATHINFO (RacingLive.php) onto the modified URL which is retarded. If you figure it out let me know.
I started digging around for how to fix this. GoDaddy doesn't load the mod_spelling module or create a RewriteMap for tolower (as far as I know). This means that I can't use the CheckSpelling On fix or the RewriteMap tolower function.
Before I continue - Please correct me if I'm wrong because what I'm about to show is a really awkward workaround...
So I had to write 26 lines of letter by letter conversion. This is what it looks like in my .htaccess file:
RewriteCond %{REQUEST_URI} !([A-Z]+)
RewriteRule .* - [S=27]
RewriteRule ^(.*)A(.*)$ http://%{HTTP_HOST}/$1a$2 [R=301,L]
RewriteRule ^(.*)B(.*)$ http://%{HTTP_HOST}/$1b$2 [R=301,L]
RewriteRule ^(.*)C(.*)$ http://%{HTTP_HOST}/$1c$2 [R=301,L]
RewriteRule ^(.*)D(.*)$ http://%{HTTP_HOST}/$1d$2 [R=301,L]
RewriteRule ^(.*)E(.*)$ http://%{HTTP_HOST}/$1e$2 [R=301,L]
RewriteRule ^(.*)F(.*)$ http://%{HTTP_HOST}/$1f$2 [R=301,L]
RewriteRule ^(.*)G(.*)$ http://%{HTTP_HOST}/$1g$2 [R=301,L]
RewriteRule ^(.*)H(.*)$ http://%{HTTP_HOST}/$1h$2 [R=301,L]
RewriteRule ^(.*)I(.*)$ http://%{HTTP_HOST}/$1i$2 [R=301,L]
RewriteRule ^(.*)J(.*)$ http://%{HTTP_HOST}/$1j$2 [R=301,L]
RewriteRule ^(.*)K(.*)$ http://%{HTTP_HOST}/$1k$2 [R=301,L]
RewriteRule ^(.*)L(.*)$ http://%{HTTP_HOST}/$1l$2 [R=301,L]
RewriteRule ^(.*)M(.*)$ http://%{HTTP_HOST}/$1m$2 [R=301,L]
RewriteRule ^(.*)N(.*)$ http://%{HTTP_HOST}/$1n$2 [R=301,L]
RewriteRule ^(.*)O(.*)$ http://%{HTTP_HOST}/$1o$2 [R=301,L]
RewriteRule ^(.*)P(.*)$ http://%{HTTP_HOST}/$1p$2 [R=301,L]
RewriteRule ^(.*)Q(.*)$ http://%{HTTP_HOST}/$1q$2 [R=301,L]
RewriteRule ^(.*)R(.*)$ http://%{HTTP_HOST}/$1r$2 [R=301,L]
RewriteRule ^(.*)S(.*)$ http://%{HTTP_HOST}/$1s$2 [R=301,L]
RewriteRule ^(.*)T(.*)$ http://%{HTTP_HOST}/$1t$2 [R=301,L]
RewriteRule ^(.*)U(.*)$ http://%{HTTP_HOST}/$1u$2 [R=301,L]
RewriteRule ^(.*)V(.*)$ http://%{HTTP_HOST}/$1v$2 [R=301,L]
RewriteRule ^(.*)W(.*)$ http://%{HTTP_HOST}/$1w$2 [R=301,L]
RewriteRule ^(.*)X(.*)$ http://%{HTTP_HOST}/$1x$2 [R=301,L]
RewriteRule ^(.*)Y(.*)$ http://%{HTTP_HOST}/$1y$2 [R=301,L]
RewriteRule ^(.*)Z(.*)$ http://%{HTTP_HOST}/$1z$2 [R=301,L]
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1 [R=301,L]
I wish I could get this to work without the redirects, but so far I haven't found any way to get mod_rewrite to not re-append the sub-directories. Ie if you try this with a path like http://nkcorner.com/RacingLive/RacingLive.php, it will keep putting the %PATHINFO (RacingLive.php) onto the modified URL which is retarded. If you figure it out let me know.
The first 2 lines say that if there are no capital letters, skip the next 27 rules. This is the only way to group or perform if else like code with rewrites. This is because you can't group multiple RewriteRule lines to a single RewriteCond line.
The last line rewrites uppercase to lowercase on windows. I noticed that since Windows is case insensitive it detects the uppercase on the RewriteCond line, but converts it to lowercase for the RewriteRule line. This means that if I want to change the URL I have to manually rewrite it with the last line. I do this for all URLs that had uppercase letters after they pass the letter by letter replacements. (Letter by letter replacements don't occur on Windows, only on Linux...)
Perhaps someone knows an easier way to do this? Perhaps a single line for replacing A-Z with a-z without the mod_spelling or tolower functions?
Anyhow this is my solution and I hope it helps someone else with the same problem.
PS, I see someone else thought of this before me:
http://www.webmasterworld.com/forum92/5308.htm.
Give credit where credit is due.
Note: Beware using this option as the stats folder provided and updated by godaddy.com uses mostly uppercase files, and this will stop working with this solution. I haven't yet thought of how to fix this but I'll probably start using .htaccess per folder and per file as a solution.
PS, I see someone else thought of this before me:
http://www.webmasterworld.com/forum92/5308.htm.
Give credit where credit is due.
Note: Beware using this option as the stats folder provided and updated by godaddy.com uses mostly uppercase files, and this will stop working with this solution. I haven't yet thought of how to fix this but I'll probably start using .htaccess per folder and per file as a solution.
Friday, September 18, 2009
Install ISE on x64 as x32 - works around lots of x64 install issues
ISE has lots of installation issues when installing on Vista x64. In order to try and work around some of them I decided to force an x32 only installation. The required steps are to run the xsetup.exe directly from the bin/nt folder instead of the default xsetup.exe in the root of the installation. Of course to update the installation you will have to use the 11.3 32-bit update. You will have to run the bin/nt/xilinxupdate.exe to install the service pack.
From preliminary tests I see this fixes ISIM and Chipscope issues. For ISIM it now installs the required files under the hdp folder.
From preliminary tests I see this fixes ISIM and Chipscope issues. For ISIM it now installs the required files under the hdp folder.
Thursday, September 17, 2009
ISE 11.3 released
I downloaded and installed ISE 11.3 update which brings an unknown bundle of updates to ISE. Unknown because they still haven't updated the ARs (Answer Records) to detail the changes / fixes / issues. I can say for sure that they haven't fixed the external editor issue (http://ionipti.blogspot.com/2009/05/xilinx-ise-using-external-editor-not.html). I see that they do have updates for many of their IP cores.
Seems to me that the major updates are related to Virtex 6 and Spartan 6 support. It would be nice to see the release notes whenever Xilinx makes them available.
Struggling now with ISIM issues on Vista x64.... (http://forums.xilinx.com/xlnx/board/message?message.uid=46879)
Seems to me that the major updates are related to Virtex 6 and Spartan 6 support. It would be nice to see the release notes whenever Xilinx makes them available.
Struggling now with ISIM issues on Vista x64.... (http://forums.xilinx.com/xlnx/board/message?message.uid=46879)
Friday, September 11, 2009
ISE 11.3 coming out this month
I'm excited to see ISE 11.3 simply because it allows full installation to use WebPack functionality when not connected to license server or when license is unavailable. It would be nice if Xilinx would past target release dates for updates, but of course they don't. I found on one of their AR (Answer Record) pages that the release date is September 2009. Not an exact date, but nice to know.
AR record: http://www.xilinx.com/support/answers/32744.htm
AR record: http://www.xilinx.com/support/answers/32744.htm
Monday, September 7, 2009
Xilinx MIG - doesn't meet timing
In attempting to compile a design at rates of 200 MHz or above I run into many timing issues with the MIG. Xilinx explains this because my board's pinout of the DDR2 pins are far away. I definitely agree with their analysis regarding why the MIG is failing to meet timing. What I don't understand is why Xilinx didn't add a parameter for increasing latency in order to increase the frequency. This is a trivial matter when dealing with a DDR2 controller.
Here's hoping MIG 3.2 will have such a parameter as this will be helpful for those occasions where you didn't place all the DDR2 pins where Xilinx believes they should be.
I guess I should delve into the MIG and add this parameter myself.... argh!
Here's hoping MIG 3.2 will have such a parameter as this will be helpful for those occasions where you didn't place all the DDR2 pins where Xilinx believes they should be.
I guess I should delve into the MIG and add this parameter myself.... argh!
Tuesday, September 1, 2009
Xilinx Divider IP / Translate -- SOLUTION!!!
So I've spent a while studying this particular issue with the hope that I could puzzle out a solution. After a few hours I finally did. The issue is that the NGC that coregen creates is not flattened. The easiest way to fix this is to open the coregen-created NGC in PlanAhead, and let PlanAhead "Export Netlist". Take the exported .edf file and do a find replace to switch all "/" to "." (without quotes of course). This will flatten the netlist and the Translate step will now act nicely.
PS. This isn't a recommended solution for flattening other netlists, but it happens to be OK for the coregen dividers from what I can see. I didn't write a parser to verify legality or any such thing, so please don't complain if this doesn't work for you. And don't use this method for flattening other netlists!
I know there is at least one other person suffering with 12 hour translates and I hope he gets this information... I will pass this onto XilinxSupport and my FAEs and ask that they pass it on to you.
Perhaps this can save the next guy hours of wasted time
PS. This isn't a recommended solution for flattening other netlists, but it happens to be OK for the coregen dividers from what I can see. I didn't write a parser to verify legality or any such thing, so please don't complain if this doesn't work for you. And don't use this method for flattening other netlists!
I know there is at least one other person suffering with 12 hour translates and I hope he gets this information... I will pass this onto XilinxSupport and my FAEs and ask that they pass it on to you.
Perhaps this can save the next guy hours of wasted time
Xilinx Divider IP and the Translate step
I have been building a highly optimized processing FPGA which requires amongst many other things a divider module. I used Core Generator to create the necessary divider. I have now noticed that the Translate step suddenly takes upwards of 2 hours to complete (with numerous dividers of course).... After lots of research and testcases it appears that Xilinx's divider module is very troublesome for their Translate tool (ngdbuild). This of course is extremely aggravating as the Translate is now the longest step by far in the whole compilation process.
I recently held a meeting with representatives of Synopsis regarding Synplify. I was happily surprised to see that they not only manage to decrypt the divider module, but to also write the divider directly into their output EDIF. The whole Synplify synthesis step took 30 minutes. This EDIF can then be Translated within a matter of minutes and of course mapped and routed. Amazing! Xilinx themselves spend over 2 hours during the Translate step which doesn't take into account XST time, whereas Synplify does the complete synthesis and bypasses the heavy Translate step in 30 minutes!!!
I am not impressed by Xilinx.
Perhaps someone knows of a Xilinx solution and would like to enlighten me.
To try this out on your own you need to create the largest divider you can and implement it a few times in a simple testcase. Easiest way is to connect the divider's ports directly to the I/O ports of the top module. The Translate step will be the longest step by far in the compilation process.
I recently held a meeting with representatives of Synopsis regarding Synplify. I was happily surprised to see that they not only manage to decrypt the divider module, but to also write the divider directly into their output EDIF. The whole Synplify synthesis step took 30 minutes. This EDIF can then be Translated within a matter of minutes and of course mapped and routed. Amazing! Xilinx themselves spend over 2 hours during the Translate step which doesn't take into account XST time, whereas Synplify does the complete synthesis and bypasses the heavy Translate step in 30 minutes!!!
I am not impressed by Xilinx.
Perhaps someone knows of a Xilinx solution and would like to enlighten me.
To try this out on your own you need to create the largest divider you can and implement it a few times in a simple testcase. Easiest way is to connect the divider's ports directly to the I/O ports of the top module. The Translate step will be the longest step by far in the compilation process.
Wednesday, August 12, 2009
on the way to optimized - a state machine step
So I've had to write a very optimized processing block. This seemed pretty daunting so I chose to ignore performance and begin with a very structured hand-written state machine. Got the process working and now onto the optimization.
Taking a state machine that is well written and making it fully pipelined is very easy! I can't give a wizard like step-by-step process to do this, but some simple ideas come to mind. If the process can be optimized then that means that your states must not be exclusive. Ie: some of the processing steps will run in parallel with other steps. In order to create such a situation you want to take code out of your state machine and put it in separate flag enabled blocks. Use these flags to control which steps run when. Sometimes you'll find that many of the steps can run without the need for flags and all that's important is some process start flag or end flag.
I have written a very simple example module to show how to do this. read_enable takes one clock to bring data in. The hardware_manipulation block takes 2 clocks to process and send data out.
When you look at the state machine you will see these steps:
0. request data (read_enable <= 1)
1. wait one clock while data is being retrieved
2. put received data into hardware_manipulation block (data_in <= data)
3. wait one clock while hardware_manipulation block is calculating
4. wait one more clock while hardware_manipulation block is outputting data
5. put hardware manipulated data out to result (result <= data_out) and go back to start
This complete process takes 6 clocks. By using the optimized version you can see that this process takes 6 steps to complete but runs 3x faster b/c it's restarted every 2 clocks.
When you look at the optimized version you will see these steps:
0. request data (read_enable <= 1)
1. wait one clock while data is being retrieved
2. put received data into hardware_manipulation block (data_in <= data) AND in parallel start from 0 again (ie request data again)
3. wait one clock while hardware_manipulation block is calculating
4. wait one more clock while hardware_manipulation block is outputting data
5. put hardware manipulated data out to result (result <= data_out)
Once the stages get going you will have stage 0 running with stages 2 and 4, and stage 1 running with stages 3 and 5.
Here's the example:
module process(
input clk,
input rst,
output reg read_enable,
input [7:0] data,
output reg [11:0] result
);
reg [7:0] data_in;
wire [7:0] data_out;
hardware_manipulation(
.clk(clk),
.data_in(data_in),
.data_out(data_out)
);
/*
reg [31:0] state;
localparam READ = 0;
localparam READ_WAIT = 1;
localparam HW = 2;
localparam HW_WAIT_1 = 3;
localparam HW_WAIT_2 = 4;
localparam OUT = 5;
always @(posedge clk) begin
if(rst) begin
state <= READ;
result <= 0;
read_enable <= 0;
end
else begin
read_enable <= 0;
case (state)
READ: begin
read_enable <= 1;
state <= READ_WAIT;
end
READ_WAIT: begin
state <= HW;
end
HW: begin
data_in <= data;
state <= HW_WAIT_1;
end
HW_WAIT_1: begin
state <= HW_WAIT_2;
end
HW_WAIT_2: begin
state <= OUT;
end
OUT: begin
result <= data_out;
state <= READ;
end
endcase
end
end*/
reg [5:0] stages;
always @(posedge clk) begin
if(rst) begin
stages <= 1;
result <= 0;
read_enable <= 0;
end
else begin
read_enable <= 0;
stages <= {stages[4:0], 0};
if(stages[0])
read_enable <= 1;
if(stages[1])
stages[0] <= 1;
if(stages[2])
data_in <= data;
if(stages[5])
result <= data_out;
end
end
endmodule
I have not simulated this block but it's simple so it should be correct... The idea comes across which is what's most important.
good luck with the optimization.
Taking a state machine that is well written and making it fully pipelined is very easy! I can't give a wizard like step-by-step process to do this, but some simple ideas come to mind. If the process can be optimized then that means that your states must not be exclusive. Ie: some of the processing steps will run in parallel with other steps. In order to create such a situation you want to take code out of your state machine and put it in separate flag enabled blocks. Use these flags to control which steps run when. Sometimes you'll find that many of the steps can run without the need for flags and all that's important is some process start flag or end flag.
I have written a very simple example module to show how to do this. read_enable takes one clock to bring data in. The hardware_manipulation block takes 2 clocks to process and send data out.
When you look at the state machine you will see these steps:
0. request data (read_enable <= 1)
1. wait one clock while data is being retrieved
2. put received data into hardware_manipulation block (data_in <= data)
3. wait one clock while hardware_manipulation block is calculating
4. wait one more clock while hardware_manipulation block is outputting data
5. put hardware manipulated data out to result (result <= data_out) and go back to start
This complete process takes 6 clocks. By using the optimized version you can see that this process takes 6 steps to complete but runs 3x faster b/c it's restarted every 2 clocks.
When you look at the optimized version you will see these steps:
0. request data (read_enable <= 1)
1. wait one clock while data is being retrieved
2. put received data into hardware_manipulation block (data_in <= data) AND in parallel start from 0 again (ie request data again)
3. wait one clock while hardware_manipulation block is calculating
4. wait one more clock while hardware_manipulation block is outputting data
5. put hardware manipulated data out to result (result <= data_out)
Once the stages get going you will have stage 0 running with stages 2 and 4, and stage 1 running with stages 3 and 5.
Here's the example:
module process(
input clk,
input rst,
output reg read_enable,
input [7:0] data,
output reg [11:0] result
);
reg [7:0] data_in;
wire [7:0] data_out;
hardware_manipulation(
.clk(clk),
.data_in(data_in),
.data_out(data_out)
);
/*
reg [31:0] state;
localparam READ = 0;
localparam READ_WAIT = 1;
localparam HW = 2;
localparam HW_WAIT_1 = 3;
localparam HW_WAIT_2 = 4;
localparam OUT = 5;
always @(posedge clk) begin
if(rst) begin
state <= READ;
result <= 0;
read_enable <= 0;
end
else begin
read_enable <= 0;
case (state)
READ: begin
read_enable <= 1;
state <= READ_WAIT;
end
READ_WAIT: begin
state <= HW;
end
HW: begin
data_in <= data;
state <= HW_WAIT_1;
end
HW_WAIT_1: begin
state <= HW_WAIT_2;
end
HW_WAIT_2: begin
state <= OUT;
end
OUT: begin
result <= data_out;
state <= READ;
end
endcase
end
end*/
reg [5:0] stages;
always @(posedge clk) begin
if(rst) begin
stages <= 1;
result <= 0;
read_enable <= 0;
end
else begin
read_enable <= 0;
stages <= {stages[4:0], 0};
if(stages[0])
read_enable <= 1;
if(stages[1])
stages[0] <= 1;
if(stages[2])
data_in <= data;
if(stages[5])
result <= data_out;
end
end
endmodule
I have not simulated this block but it's simple so it should be correct... The idea comes across which is what's most important.
good luck with the optimization.
Thursday, August 6, 2009
How you use a DSP48E slice... or DSP48E tile...
Once again I find myself reading and tirelessly paging through Xilinx documentation in order to understand how to properly implement a DSP48E block. Of course before I did this I just wrote my code and let the tools figure out what to do. Now I desired to instantiate the block myself and perhaps to get some added value by doing this. I can happily report that I've done it, and lowered the FF (Flip Flop) and LUT (Look Up Table) usage by a significant amount! Here are a few tips that might help you get started:
On the Virtex 5 chips you have columns of DSP48E tiles. A tile is 2 DSP48E slices arranged vertically. A slice is a single DSP48E block. The V5 syntax for location constraints (LOC) is DSP48E_XcYr where c is the column and r is the row. Each Virtex 5 chip can have a different number of DSP48E columns. The DSP48E's counting is not related to the typical SLICE columns or rows, they are separately counted. Bottom left DSP48E is DSP48_X0Y0, and top right DSP48E for the SX95T is DSP48_X9Y63. This equates to 640 DSP48E slices (in 320 DSP48E tiles).
A DSP48E has a lot (emphasized) of functionality. Refer to ug193.pdf from Xilinx for detailed descriptions.
The embedded registers in the DSP48E and its ability to change its operation on a clock-by-clock basis block save lots fabric FFs and LUTs. A lot of functionality that would typically be taken out of the DSP48E block can be kept inside by using its registers and different modes of operation.
Another function which is very nice is the PCIN/PCOUT. A lower DSP48E in a tile can transfer it's output, without going out to the fabric, to the higher DSP48E in the same tile for a joint calculation. This calculation is then saved from being done on the fabric.
A few caveats:
PCIN/PCOUT must be connected via a wire bus of the FULL 48 bit width. The tools will give an error if you attempt to connect only a part of the bus. This is of course completely logical, but a more descriptive error and explanation would be nice. I'm sure this applies the same to all other silicon interconnected buses between the DSP48E blocks for the same reasons. Once PCIN and PCOUT are connected, and of course only between 2 DSP48E blocks as these buses are direct between 2 adjacent DSP48E blocks, the tools will attempt to place them properly such that the connection is valid. This means that if the tools cannot find a single tile to place these two DSP48E blocks into and in the correct order then it will fail at Map. You can force the location of DSP48E blocks using the LOC constraint, or the relative location using the RLOC constraint. U_SET is useful if you are trying to use RLOC and want that constraint to be relative to only a specific group of DSP48E blocks.
Thumbs up to Xilinx for some excellent DSP blocks in the Virtex 5!
PS - Be aware of 2 errors in the Virtex 5 HDL Documenation:
The port is not CEMULTCARRY-IN but rather CEMULTCARRYIN.
The string value is not "NO_PAT_DET" but rather "NO_PATDET". - This error currently only comes out at the Map stage so will only be caught after the long Synthesis and Translate steps.
I've had Xilinx create 2 CRs to fix the documentation errors and the error reporting issue relating to this.
Good luck,
On the Virtex 5 chips you have columns of DSP48E tiles. A tile is 2 DSP48E slices arranged vertically. A slice is a single DSP48E block. The V5 syntax for location constraints (LOC) is DSP48E_XcYr where c is the column and r is the row. Each Virtex 5 chip can have a different number of DSP48E columns. The DSP48E's counting is not related to the typical SLICE columns or rows, they are separately counted. Bottom left DSP48E is DSP48_X0Y0, and top right DSP48E for the SX95T is DSP48_X9Y63. This equates to 640 DSP48E slices (in 320 DSP48E tiles).
A DSP48E has a lot (emphasized) of functionality. Refer to ug193.pdf from Xilinx for detailed descriptions.
The embedded registers in the DSP48E and its ability to change its operation on a clock-by-clock basis block save lots fabric FFs and LUTs. A lot of functionality that would typically be taken out of the DSP48E block can be kept inside by using its registers and different modes of operation.
Another function which is very nice is the PCIN/PCOUT. A lower DSP48E in a tile can transfer it's output, without going out to the fabric, to the higher DSP48E in the same tile for a joint calculation. This calculation is then saved from being done on the fabric.
A few caveats:
PCIN/PCOUT must be connected via a wire bus of the FULL 48 bit width. The tools will give an error if you attempt to connect only a part of the bus. This is of course completely logical, but a more descriptive error and explanation would be nice. I'm sure this applies the same to all other silicon interconnected buses between the DSP48E blocks for the same reasons. Once PCIN and PCOUT are connected, and of course only between 2 DSP48E blocks as these buses are direct between 2 adjacent DSP48E blocks, the tools will attempt to place them properly such that the connection is valid. This means that if the tools cannot find a single tile to place these two DSP48E blocks into and in the correct order then it will fail at Map. You can force the location of DSP48E blocks using the LOC constraint, or the relative location using the RLOC constraint. U_SET is useful if you are trying to use RLOC and want that constraint to be relative to only a specific group of DSP48E blocks.
Thumbs up to Xilinx for some excellent DSP blocks in the Virtex 5!
PS - Be aware of 2 errors in the Virtex 5 HDL Documenation:
The port is not CEMULTCARRY-IN but rather CEMULTCARRYIN.
The string value is not "NO_PAT_DET" but rather "NO_PATDET". - This error currently only comes out at the Map stage so will only be caught after the long Synthesis and Translate steps.
I've had Xilinx create 2 CRs to fix the documentation errors and the error reporting issue relating to this.
Good luck,
Monday, July 6, 2009
ISE 11.2 and Partitions
Recently I thought to speed up my FPGA compilation times. One method of doing this is by using "Partitions." The idea behind a partition is that you mark a module or hierarchy of modules as a partition and that partition gets processed once and that's it. Unless of course you modify the partition in which case it is synthesized / mapped / placed and routed again.
Wouldn't it be nice if things worked as they were supposed to?
So partitions in ISE are going to be removed in version 12.1 and they will then only be available in PlanAhead. This means that Xilinx won't fix any of the many wonderful bugs in their partition implementation.
Two bugs that I've experienced with partitions:
1. Modify a source file that causes ISE's automated uninitiated parsing of source files to fail and all partition information may be removed. That depends on how badly the parsing failed. This sucks as ISE will parse your source files as soon as you set focus back onto ISE, and of course you may not be done editing your source files at that moment. (It should be noted that I use an external editor for my source files.)
2. INTERNAL_ERROR:Xst:cmain.c:3446:1.47.6.1 -
I don't know what this error is but it won't let my partitions work...
I've of course removed all partitions from the design.
hoping you have better experience with partitions
Wouldn't it be nice if things worked as they were supposed to?
So partitions in ISE are going to be removed in version 12.1 and they will then only be available in PlanAhead. This means that Xilinx won't fix any of the many wonderful bugs in their partition implementation.
Two bugs that I've experienced with partitions:
1. Modify a source file that causes ISE's automated uninitiated parsing of source files to fail and all partition information may be removed. That depends on how badly the parsing failed. This sucks as ISE will parse your source files as soon as you set focus back onto ISE, and of course you may not be done editing your source files at that moment. (It should be noted that I use an external editor for my source files.)
2. INTERNAL_ERROR:Xst:cmain.c:3446:1.47.6.1 -
I don't know what this error is but it won't let my partitions work...
I've of course removed all partitions from the design.
hoping you have better experience with partitions
Thursday, July 2, 2009
Using synchronous resets - delay them!
When I first started writing FPGAs some years back I was always shown how to use asynchronous resets. This reset line would have it's timing blocked, and it would be used throughout the FPGA. I have now become accustomed to using synchronous resets, and the first very obvious mistake I made was using a single reset line synchronously. This causes a huge and wasteful fanout. Register the reset line as it goes into different modules and you'll save yourself time when it comes to routing. It'll cost a few extra flip flops, but it's worth it. If your design can't deal with different modules coming out of reset at different times, then perhaps its time to rethink your design.
Monday, June 29, 2009
Multiple MIG 3.0 / 3.1 cores
How do you implement multiple DDR2 MIG cores in a single design? I guess the first question is how do you implement and integrate a single MIG core? I have been using the Core Generator from within ISE to add cores to my design. A MIG core is added by creating the core from within ISE which also automatically adds the generated xco file to the project. What does this mean for a MIG core? Let's assume you've named the core mig_a. A MIG core is open source - it creates all the sources under the folder
ipcore_dir/mig_a/user_design/rtl
or
ipcore_dir/mig_a/example_design/rtl
When you compile your project with the mig_a.xco file in your project, then it's just like you've added all of the rtl sources manually. This will enable you to instantiate the module mig_a.
Remember that you must take the UCF constraints as created in
ipcore_dir/mig_a/user_design/par/mig_a.ucf
and copy them over to your main UCF file.
In order to implement two DDR2 cores you would create mig_a.xco and mig_b.xco. The first issue you run into is with conflicting UCF constraints due to both mig_a and mig_b using identically named constraints. This is easy to solve, and shouldn't cause too much of a headache. If someone knows of a better way to do this I'd love a comment explaining.
The next issue is conflicting source component names. Here's the simple trick to solve those. DON'T USE THE XCO FILES! The DDR2 cores' source files are identical! The only difference in them are the parameters used at the top level instantiation file. Insert the files
ipcore_dir/mig_a/user_design/rtl/*.v and
ipcore_dir/mig_b/user_design/rtl/mig_b.v.
The final issue is the conflicting IODELAY_GRP name which by default in MIG's cores is set as "IODELAY_MIG." This should be changed in mig_a.v or mig_b.v but preferably in both for clarity's sake so that they don't conflict with each other. Use names like "IODELAY_MIG_A" and "IODELAY_MIG_B."
This solves all the conflicting component issues. You can now freely update your cores from MIG 3.0 to 3.1 without having to manually modify any files. You are using the Xilinx created files just as they are and without any ugly conflicts.
bye
ipcore_dir/mig_a/user_design/rtl
or
ipcore_dir/mig_a/example_design/rtl
When you compile your project with the mig_a.xco file in your project, then it's just like you've added all of the rtl sources manually. This will enable you to instantiate the module mig_a.
Remember that you must take the UCF constraints as created in
ipcore_dir/mig_a/user_design/par/mig_a.ucf
and copy them over to your main UCF file.
In order to implement two DDR2 cores you would create mig_a.xco and mig_b.xco. The first issue you run into is with conflicting UCF constraints due to both mig_a and mig_b using identically named constraints. This is easy to solve, and shouldn't cause too much of a headache. If someone knows of a better way to do this I'd love a comment explaining.
The next issue is conflicting source component names. Here's the simple trick to solve those. DON'T USE THE XCO FILES! The DDR2 cores' source files are identical! The only difference in them are the parameters used at the top level instantiation file. Insert the files
ipcore_dir/mig_a/user_design/rtl/*.v and
ipcore_dir/mig_b/user_design/rtl/mig_b.v.
The final issue is the conflicting IODELAY_GRP name which by default in MIG's cores is set as "IODELAY_MIG." This should be changed in mig_a.v or mig_b.v but preferably in both for clarity's sake so that they don't conflict with each other. Use names like "IODELAY_MIG_A" and "IODELAY_MIG_B."
This solves all the conflicting component issues. You can now freely update your cores from MIG 3.0 to 3.1 without having to manually modify any files. You are using the Xilinx created files just as they are and without any ugly conflicts.
bye
Sunday, June 21, 2009
multi-dimensional array ports and variable bit selects
As a software programmer who moved to VHDL and then to Verilog, I have always found Verilog's language limitations to be very frustrating. One of my biggest complaints is Verilog's inability to use multi-dimensional arrays in port declarations.
For example:
module test(
input reg [7:0] a [2:0]
);
is illegal in Verilog. The only way to do this is:
module test(
input reg [23:0] a
);
What I had previously found myself doing was declaring the port level array as flat, and then converting it inside a standard generate block to the originally intended multi-dimensional array.
This is much clunkier than necessary. Variable bit-selects in Verilog allow for a much easier design. You can easily access the bits you require with this:
always @(posedge clk) begin : indexing process
integer i, index, found;
found = 0;
for(i = 0; i < 3; i = i + 1) begin
if(!found) begin
if(get_this_index[i]) begin
index = i;
found = 1;
end
end
end
if(found) begin
my_data <= a[index * 8+:8];
// my_data <= a_non_flat[index];
end
end
*** Note the +: notation. The first value is the start bit in the array, and the second value is the number of bits going up the array ( -: would go down the array). This shouldn't be confused with normal x:y notation. This is what makes pulling the correct bits out so easy.
This easily allows for parameterized modules with both compile-time bit widths and compile-time number of elements. The lack of multi-dimensional ports is made up for by variable bit selects. Of course you do need to remember the size of each "index" of data otherwise you'll be accessing the wrong bits...
Be enlightened,
For example:
module test(
input reg [7:0] a [2:0]
);
is illegal in Verilog. The only way to do this is:
module test(
input reg [23:0] a
);
What I had previously found myself doing was declaring the port level array as flat, and then converting it inside a standard generate block to the originally intended multi-dimensional array.
This is much clunkier than necessary. Variable bit-selects in Verilog allow for a much easier design. You can easily access the bits you require with this:
always @(posedge clk) begin : indexing process
integer i, index, found;
found = 0;
for(i = 0; i < 3; i = i + 1) begin
if(!found) begin
if(get_this_index[i]) begin
index = i;
found = 1;
end
end
end
if(found) begin
my_data <= a[index * 8+:8];
// my_data <= a_non_flat[index];
end
end
*** Note the +: notation. The first value is the start bit in the array, and the second value is the number of bits going up the array ( -: would go down the array). This shouldn't be confused with normal x:y notation. This is what makes pulling the correct bits out so easy.
This easily allows for parameterized modules with both compile-time bit widths and compile-time number of elements. The lack of multi-dimensional ports is made up for by variable bit selects. Of course you do need to remember the size of each "index" of data otherwise you'll be accessing the wrong bits...
Be enlightened,
Thursday, June 11, 2009
ISE internal infrastructure - does it exist?
I've been using ISE 11.1 for a couple months now and I must say that it makes me cringe when I think about the code that created this monster.
The good - Everything done outside of ISE. I have noticed that the tools which are the backend of ISE and everything they do are very reliable. At least from all of my experience. I'm guessing this is because they are pretty simple single threaded programs who know how to do the work they were built to do.
The bad - ISE. I am very sympathetic to the pains of writing a GUI which has to synchronize all sorts of other tasks. I know this is a very difficult task. You never know when the user will click something or when he might stop a process. ISE has to keep track of temporary files, source file changes, statuses and lots of other things. It's clear that the job is not easy. On the other hand Xilinx is a huge company and has the resources and responsibility to provide its users with a much better product.
In my experience ISE has many stability issues and quirks. I have learned to take them in stride. What I find shocking is that these bugs come from a very bad low-level infrastructure in the program, and not just a random assortment of small mistakes. (This is a huge assumption, but it is based both on my extensive experience with software and multithreaded applications, and on my experience with ISE.)
Example: When you change your source files in an external editor you see that ISE processes those changes only once you change focus back to ISE. Why??? That must be the strangest thing ever. It can actually take ISE seconds (why so long???) before it updates it's statuses - and you can see that in the ISE Processes window. If you click Generate Bitstream, or Synthesize before it has processed and updates its status then it will be ignored ostensibly because the files are up-to-date. This strikes me as an extraordinarily bad way to update internal ISE project status. Once a file is changed ISE should immediately process that and mark that file and all subsequent files as out-of-date.
Hoping for better with ISE 11.2 but seriously doubting it
The good - Everything done outside of ISE. I have noticed that the tools which are the backend of ISE and everything they do are very reliable. At least from all of my experience. I'm guessing this is because they are pretty simple single threaded programs who know how to do the work they were built to do.
The bad - ISE. I am very sympathetic to the pains of writing a GUI which has to synchronize all sorts of other tasks. I know this is a very difficult task. You never know when the user will click something or when he might stop a process. ISE has to keep track of temporary files, source file changes, statuses and lots of other things. It's clear that the job is not easy. On the other hand Xilinx is a huge company and has the resources and responsibility to provide its users with a much better product.
In my experience ISE has many stability issues and quirks. I have learned to take them in stride. What I find shocking is that these bugs come from a very bad low-level infrastructure in the program, and not just a random assortment of small mistakes. (This is a huge assumption, but it is based both on my extensive experience with software and multithreaded applications, and on my experience with ISE.)
Example: When you change your source files in an external editor you see that ISE processes those changes only once you change focus back to ISE. Why??? That must be the strangest thing ever. It can actually take ISE seconds (why so long???) before it updates it's statuses - and you can see that in the ISE Processes window. If you click Generate Bitstream, or Synthesize before it has processed and updates its status then it will be ignored ostensibly because the files are up-to-date. This strikes me as an extraordinarily bad way to update internal ISE project status. Once a file is changed ISE should immediately process that and mark that file and all subsequent files as out-of-date.
Hoping for better with ISE 11.2 but seriously doubting it
Friday, May 29, 2009
Chipscope - pleasantly surprised
I have been using Chipscope since I started working with Xilinx FPGAs a few months back. At first I was very disappointed by it's inability to detect custom types and state machines. I think it would be nice if they could find a way of adding this feature, but all in all I have become very satisfied with it's performance and abilities.
Caveat: If you change the signals from Inserter then it's worthwhile creating a new Analyzer project instead of letting Chipscope detect and try to deal with the changes. That has never worked well for me.
Chipscope is Xilinx's attempt to build a true logic analyzer. They avoided trying to delve too deep into what your project was about (ie they don't learn custom types or even buses), and instead they work generically with the project and this allows Chipscope to be very stable.
The analyzer allows you to import the signal names from the Inserter project. The names could've been inserted automatically but that would have required some form of matching scheme to know that the FPGA is compiled with specific signal information.
Background comparison of alternate FPGA logic analyzer:
Lattice's Reveal:
Reveal allows you to do the same basic things as chipscope. One main difference is that Reveal will learn the types you are adding in it's inserter, and it's analyzer will automatically detect those types and states. This allows Reveal's analyzer to automatically build buses, and support state machines, and even show state names in the logic window. There are numerous disadvantages to this scheme. Reveal doesn't support all types of signals. For example integers, and custom types other than state machines aren't supported from what I've seen. Those signals are kept grey within the Inserter program. Another major issue is that Reveal needs to use some form of matching scheme between the compiled core and the inserter project used to create it. If Reveal detects that there isn't a match then it won't open the core for analysis at all. This can easily happen if I add a single signal to the end of the data list and then I want to analyze both the new compiled FPGA and the old one. The old compiled FPGA will no longer be able to be analyzed without the old Reveal project files which match it.
Back to Xilinx:
Chipscope allows you to add every signal that exists in the design, regardless of how it is created. It is listed in XST format so that multi-dimensional buses become signal_0_0 and signal_0_1 and so on. At least this way I can add them, and I can create the buses properly within Chipscope Analyzer. Regarding bus-creation I find that Chipscope 11.1 has a huge advantage over previous versions with it's Auto Create Buses option. It builds buses based on their names. If it builds it incorrectly then of course the bus can be recreated properly, but I have yet to see it build a bus incorrectly. It doesn't yet manage to build bus arrays so build them myself. The biggest set-back is it's lack of support for simple state machine's, but Chipscope does have a good way of working around this. You need to use token files. If you give your state machine's set values for their states then you can use a token file to tell the Analyzer what each state's value means, and to display it with the token (ie name) in the logic window.
Chipscope has all the basic support like many other analyzers, plus some nice advanced ones. It supports multiple analysis cores, multiple match units, triggers, sequencers, counters, and many other functions. Although what makes it stand out beyond Lattice's Reveal is it's ability to store information based on a match unit, and display it only after a distant trigger. This is a very useful function. I have just recently used this to great success in debugging what turns out to be an XST compiler issue (that's discussed in my post about for-loop and exit support). I triggered 65000 clocks in the future but still had the information I needed to debug the issue.
Nice job Chipscope team - keep up the great work!
Caveat: If you change the signals from Inserter then it's worthwhile creating a new Analyzer project instead of letting Chipscope detect and try to deal with the changes. That has never worked well for me.
Chipscope is Xilinx's attempt to build a true logic analyzer. They avoided trying to delve too deep into what your project was about (ie they don't learn custom types or even buses), and instead they work generically with the project and this allows Chipscope to be very stable.
The analyzer allows you to import the signal names from the Inserter project. The names could've been inserted automatically but that would have required some form of matching scheme to know that the FPGA is compiled with specific signal information.
Background comparison of alternate FPGA logic analyzer:
Lattice's Reveal:
Reveal allows you to do the same basic things as chipscope. One main difference is that Reveal will learn the types you are adding in it's inserter, and it's analyzer will automatically detect those types and states. This allows Reveal's analyzer to automatically build buses, and support state machines, and even show state names in the logic window. There are numerous disadvantages to this scheme. Reveal doesn't support all types of signals. For example integers, and custom types other than state machines aren't supported from what I've seen. Those signals are kept grey within the Inserter program. Another major issue is that Reveal needs to use some form of matching scheme between the compiled core and the inserter project used to create it. If Reveal detects that there isn't a match then it won't open the core for analysis at all. This can easily happen if I add a single signal to the end of the data list and then I want to analyze both the new compiled FPGA and the old one. The old compiled FPGA will no longer be able to be analyzed without the old Reveal project files which match it.
Back to Xilinx:
Chipscope allows you to add every signal that exists in the design, regardless of how it is created. It is listed in XST format so that multi-dimensional buses become signal_0_0 and signal_0_1 and so on. At least this way I can add them, and I can create the buses properly within Chipscope Analyzer. Regarding bus-creation I find that Chipscope 11.1 has a huge advantage over previous versions with it's Auto Create Buses option. It builds buses based on their names. If it builds it incorrectly then of course the bus can be recreated properly, but I have yet to see it build a bus incorrectly. It doesn't yet manage to build bus arrays so build them myself. The biggest set-back is it's lack of support for simple state machine's, but Chipscope does have a good way of working around this. You need to use token files. If you give your state machine's set values for their states then you can use a token file to tell the Analyzer what each state's value means, and to display it with the token (ie name) in the logic window.
Chipscope has all the basic support like many other analyzers, plus some nice advanced ones. It supports multiple analysis cores, multiple match units, triggers, sequencers, counters, and many other functions. Although what makes it stand out beyond Lattice's Reveal is it's ability to store information based on a match unit, and display it only after a distant trigger. This is a very useful function. I have just recently used this to great success in debugging what turns out to be an XST compiler issue (that's discussed in my post about for-loop and exit support). I triggered 65000 clocks in the future but still had the information I needed to debug the issue.
Nice job Chipscope team - keep up the great work!
ISE 11.1 stability issues
I have been using ISE 11.1 for about a month now. There are numerous stability issues from crashes and freezes to messages no longer being updated in the Design Report view. I find that running Rerun All is always a crap shoot as it brings ISE down much of the time. I have gotten coregen to freeze on occasion when invoking it within the ISE project. ISE's tracking of source file states and build states is hit or miss. Sometimes it decides that Synthesis isn't up to date, but shockingly Map and P&R are. I change source files and it's completely unwilling to restart the process of compilation. This requires the infamous Rerun or Rerun All commands. Then it has those times when the compilation just won't work unless you "Cleanup Project Files" and then try again. I know that there are lots of temporary files being used during FPGA synthesis, and plenty of intermediate files created, but is it so difficult to check the state of the source files and if one has been updated recently to set build state back to zero?
These issues only strengthen the argument against ISE keeping an open handle to its configured external editor. See my post on problems when using an external editor from ISE.
Seems to me that Xilinx needs their GUI team updated.
These issues only strengthen the argument against ISE keeping an open handle to its configured external editor. See my post on problems when using an external editor from ISE.
Seems to me that Xilinx needs their GUI team updated.
Xilinx XST for loop and disable keyword support
Xilinx once again fails bady when it comes to quality of tool issues. Recently I converted my PCI Express block from static to parametrized. This required lots arrays and multi-dimensional arrays, lots of parameters, and plenty of assign statements to take arrays out of and into modules (since module ports don't support multi-dimensional arrays). I implemented plenty of generate statements for conditional instantiation of FIFOs and also for loops for conditional processing of blocks.
The language specific way of breaking out of loops in Verilog is by using the "disable" statement. Sadly XST 11.1 doesn't support the disable statement from within a "for loop". The recommended workaround as shown in AR #22177 is to increment the "for loop" iterator to it's exit value. This works some of the time, and doesn't work other times. I stronly recommend against doing this as it clear to me that support for exiting a for loop early is shaky at best. I found that using a while statement with manual iteration worked better. I set the iterator to the exit value and it actually exited the loop properly.
Hoping that your experience is better than mine,
The language specific way of breaking out of loops in Verilog is by using the "disable" statement. Sadly XST 11.1 doesn't support the disable statement from within a "for loop". The recommended workaround as shown in AR #22177 is to increment the "for loop" iterator to it's exit value. This works some of the time, and doesn't work other times. I stronly recommend against doing this as it clear to me that support for exiting a for loop early is shaky at best. I found that using a while statement with manual iteration worked better. I set the iterator to the exit value and it actually exited the loop properly.
Hoping that your experience is better than mine,
Saturday, May 16, 2009
Xilinx ISE using an external editor (not a good idea)
I like to use GViM for editing files in Windows. Given that ISE supports using an external editor I immediately went about setting it to use my preferred editor. What I discovered with ISE 10.1 was that ISE keeps an open handle to the editor when editing files. This is stupid! The issue is very problematic because ISE has a tendency to crash, and with an open handle to the editor, it brings the editor down with it (at least with gvim.exe). ISE 11.1 was recently released and sadly it has the same issue. The CR (Change Request) I got filed with Xilinx to fix it will only be done with ISE 12.1. What I do for now is open gvim outside of ISE, and then double clicks in ISE will open in the same gvim that I'm using without having that handle connecting gvim's existence with that of ISE.
ISE crashes - gvim lives :)
This relates to my post about ISE stability issues.
ISE crashes - gvim lives :)
This relates to my post about ISE stability issues.
Thursday, April 30, 2009
ISE 11.1
Quick review.... Licensing is the major change, and it was actually quite painless. Xilinx allows you to create licenses from the Entitlement page. My license was created immediately which in turn allowed me to start working immediately. Installation was done from the 5.3 GB tar file download also from Xilinx's Entitlement page. The download was fast, about 2 1/2 hours. Installation includes Chipscope, EDK and the standard ISE Foundation. The last step in the installation takes you to the licensing page and uploads your computer's ID so that the license can be created.
They have made some nice GUI improvements. The IDE looks smoother. They have also changed the text on many of the compilation messages. I'm guessing there was some overhauling done on all stages. Synthesis and par (Place and Route) are supposedly much faster. So far I haven't run into any major problems with the new tools. A nice change is the new xise project files. Finally a text based (XML) file that can easily be stored compared and modified.
There is a new MIG (Memory Interface Generator), version 3.0. The only difference I can see is that the previous requirements for column locations, and master / slave DQ bit information is no longer required. This is very nice as it means you can change IO constraints without having to run "Update Design," or manually change those values. Apparently this update is due to an update in the par which can intelligently place components without the previously required information. The other change in the MIG is that they are using PLLs now instead of DCMs. This might have to do with DCMs being removed from the Virtex 6, and to keep MIG compatible with the new chips.
Enjoy the new tools.
They have made some nice GUI improvements. The IDE looks smoother. They have also changed the text on many of the compilation messages. I'm guessing there was some overhauling done on all stages. Synthesis and par (Place and Route) are supposedly much faster. So far I haven't run into any major problems with the new tools. A nice change is the new xise project files. Finally a text based (XML) file that can easily be stored compared and modified.
There is a new MIG (Memory Interface Generator), version 3.0. The only difference I can see is that the previous requirements for column locations, and master / slave DQ bit information is no longer required. This is very nice as it means you can change IO constraints without having to run "Update Design," or manually change those values. Apparently this update is due to an update in the par which can intelligently place components without the previously required information. The other change in the MIG is that they are using PLLs now instead of DCMs. This might have to do with DCMs being removed from the Virtex 6, and to keep MIG compatible with the new chips.
Enjoy the new tools.
Monday, April 27, 2009
mig 2.3 - what's this all about
So I had to implement an SDRAM controller on a Xilinx chip. That's the MIG (Memory Interface Generator). First off - this is no simple wizard click click task, you must understand how the MIG works to get going. The MIG in the end just creates source files for you, there is no NGO files involved. It creates a completely open-source implementation. This has certain implications:
1. The source is compiled within your project, so any XST (Synthesis) settings you have will affect the synthesis of the MIG. For example, if you want your design to be optimized for area and the MIG needs to be optimized for speed then you better figure out how to create a black box.
2. It is not extremely friendly as the assumption is that you will open the source and edit what you need to. If you choose to have the MIG implement the clocking then you will have to provide GC (Global Clock) input pins for both the interface speed (266 MHz for example) and the 200 MHz IDELAY clock otherwise the code won't compile and you'll need to modify the source (most changes will be in the infrastructure file). If you choose to implement the clocks yourself than you'll have to provide the 4 clocks correctly phase aligned (phase alignment isn't required for the 200 MHz IDELAY clock) and sent through BUFGs (Global Buffers) into the top MIG module.
3. All EVM examples I've seen provide a different pinout than the defaults given by the MIG, and under most circumstances so will any new board. This is a pain in the ass, but not as bad as you might think. There are a number of tedious requirements for the MIG, ie "What is the column location of all DQS pins?", "What type of pin is each DQ - Master or Slave?", and "How many regions have DQS pins in them?". This is all annoying and can manually be updated by following the instructions in the MIG user manual (appendix about required UCF modifications), but the easiest method of updating this information is to create a MIG implementation. Make sure to mark all banks available for all functions (one of the last pages with a list of banks and checkboxes for address, control, and data). Take the created UCF file and remove all constraints except the pin locations. Update all pin locations manually to the pinout of your board. Open the MIG design again and choose the option "Update Design". Give it the modified UCF and let it create the full design again. It will update all constraints and source files, and spit out a new UCF with the correct pinout and all the required modifications.
4. The MIG provides 2 example trees, "user_design", and "example_design". To compile the designs you can use the ise_flow.bat found in the designated par folder. The example design allows you to provide just the SDRAM signals, the clocks, and the reset. The user design would require you to add your own user code.
The example design is best used to test that the MIG implementation works. The example design compiles into a self-contained testbench of the SDRAM. The easiest way I found to use the example design was to insert a chipscope core into it which shows the main application signals. The chipscope core is fairly easy to implement. Do Not Use the Debug Option in the MIG. Haven't really figured out what it's for, but I believe it shows you the calibration sequence which is not what you want to see. To get chipscope working you have to build a normal MIG design, update pinouts (as explained above), modify clocks, run through the XST stage in the ise_flow.bat and then start up chipscope inserter. After creating the core and exiting chipscope inserter call inserter.exe on the command line to insert the .cdc project and create a new ngc file (which contains the basic design and the inserted chipscope core). Finish the rest of the steps from ise_flow.bat with the new ngc (output from inserter.exe) It should finish the compilation and you will now have a design which will show you the signals from within the testbench when using chipscope analyzer. If you're lucky you'll see phy_init_done rise, and the accesses begin to take place.
Happy migging!
1. The source is compiled within your project, so any XST (Synthesis) settings you have will affect the synthesis of the MIG. For example, if you want your design to be optimized for area and the MIG needs to be optimized for speed then you better figure out how to create a black box.
2. It is not extremely friendly as the assumption is that you will open the source and edit what you need to. If you choose to have the MIG implement the clocking then you will have to provide GC (Global Clock) input pins for both the interface speed (266 MHz for example) and the 200 MHz IDELAY clock otherwise the code won't compile and you'll need to modify the source (most changes will be in the infrastructure file). If you choose to implement the clocks yourself than you'll have to provide the 4 clocks correctly phase aligned (phase alignment isn't required for the 200 MHz IDELAY clock) and sent through BUFGs (Global Buffers) into the top MIG module.
3. All EVM examples I've seen provide a different pinout than the defaults given by the MIG, and under most circumstances so will any new board. This is a pain in the ass, but not as bad as you might think. There are a number of tedious requirements for the MIG, ie "What is the column location of all DQS pins?", "What type of pin is each DQ - Master or Slave?", and "How many regions have DQS pins in them?". This is all annoying and can manually be updated by following the instructions in the MIG user manual (appendix about required UCF modifications), but the easiest method of updating this information is to create a MIG implementation. Make sure to mark all banks available for all functions (one of the last pages with a list of banks and checkboxes for address, control, and data). Take the created UCF file and remove all constraints except the pin locations. Update all pin locations manually to the pinout of your board. Open the MIG design again and choose the option "Update Design". Give it the modified UCF and let it create the full design again. It will update all constraints and source files, and spit out a new UCF with the correct pinout and all the required modifications.
4. The MIG provides 2 example trees, "user_design", and "example_design". To compile the designs you can use the ise_flow.bat found in the designated par folder. The example design allows you to provide just the SDRAM signals, the clocks, and the reset. The user design would require you to add your own user code.
The example design is best used to test that the MIG implementation works. The example design compiles into a self-contained testbench of the SDRAM. The easiest way I found to use the example design was to insert a chipscope core into it which shows the main application signals. The chipscope core is fairly easy to implement. Do Not Use the Debug Option in the MIG. Haven't really figured out what it's for, but I believe it shows you the calibration sequence which is not what you want to see. To get chipscope working you have to build a normal MIG design, update pinouts (as explained above), modify clocks, run through the XST stage in the ise_flow.bat and then start up chipscope inserter. After creating the core and exiting chipscope inserter call inserter.exe on the command line to insert the .cdc project and create a new ngc file (which contains the basic design and the inserted chipscope core). Finish the rest of the steps from ise_flow.bat with the new ngc (output from inserter.exe) It should finish the compilation and you will now have a design which will show you the signals from within the testbench when using chipscope analyzer. If you're lucky you'll see phy_init_done rise, and the accesses begin to take place.
Happy migging!
Tuesday, April 21, 2009
Can Xilinx ISE be any worse? #!@&^ PCF files
Working with Xilinx's ISE 10.1 has proven to be quite a challenge. The latest failure of their tools has to do with PCF (Physical Constraints File) files. Here's an error that took a while to figure out:
Resolving constraint associations...
Checking Constraint Associations...
ERROR:ConstraintSystem:59 - Constraint "pcie_inst/pcie/BU2/U0/pcie_ep0/pcie_blk/clocking_i/clkout0" TNM_NET =
pcie_inst_pcie_BU2_U0_pcie_ep0_pcie_blk_clocking_i_clkout0>: NET
"pcie_inst/pcie/BU2/U0/pcie_ep0/pcie_blk/clocking_i/clkout0" not found.
Please verify that:
1. The specified design element actually exists in the original design.
2. The specified object is spelled correctly in the constraint source file.
This error is due to the fact that I have constraints that refer to objects that no longer exist in the code. This makes sense since I commented out all pcie code. This doesn't make sense since I also commented out (from the project's UCF file) all constraints (including the above one) that refer to pcie code. So why is it complaining about a constraint that doesn't exist? After attempting my "Rerun All" commands, and after many changes and frustrations I discovered that ISE copies constraints from the project's UCF file into a PCF file during the compilation process. Since I previously had these constraints and compiled with them, the old PCF file wasn't updated. ISE doesn't bother deleting or recreating this file when manually editing the UCF file which causes it to give errors on constraints that no longer exist.
Delete this file, it might help.
Resolving constraint associations...
Checking Constraint Associations...
ERROR:ConstraintSystem:59 - Constraint
pcie_inst_pcie_BU2_U0_pcie_ep0_pcie_blk_clocking_i_clkout0>: NET
"pcie_inst/pcie/BU2/U0/pcie_ep0/pcie_blk/clocking_i/clkout0" not found.
Please verify that:
1. The specified design element actually exists in the original design.
2. The specified object is spelled correctly in the constraint source file.
This error is due to the fact that I have constraints that refer to objects that no longer exist in the code. This makes sense since I commented out all pcie code. This doesn't make sense since I also commented out (from the project's UCF file) all constraints (including the above one) that refer to pcie code. So why is it complaining about a constraint that doesn't exist? After attempting my "Rerun All" commands, and after many changes and frustrations I discovered that ISE copies constraints from the project's UCF file into a PCF file during the compilation process. Since I previously had these constraints and compiled with them, the old PCF file wasn't updated. ISE doesn't bother deleting or recreating this file when manually editing the UCF file which causes it to give errors on constraints that no longer exist.
Delete this file, it might help.
Monday, April 20, 2009
Continuation - Xilinx I/O pin's location
I recently discovered the easiest way of figuring out whether an I/O is left, center, or right. The I/O name's X value is 0 for left, 1 for center, and 2 for right. This means that IOB_X1Y318 is center b/c of the X1 value, IOB_X0Y179 is left b/c of the X0, and IOB_X2Y198 is right b/c of X2. This is very easy once you are aware of this naming convention, and I hope this information is helpful for others.
Thursday, April 16, 2009
Xilinx I/O pin's location - left, right, or center?
So while working with the MIG (Memory Interface Generator) I was required to find out whether each DQS (Data Strobe) pin was located "Left", "Center", or "Right". I figured this should be easy, just open the documentation. It's not in the documentation. You can only find which bank a pin is in, but you can't find whether a bank (and all it's pins) are left, right or center. I finally discovered with the help of the FAE (Field Applications Engineer - support contact) that I can find whether a bank is left, right, or center by using the Floorplan view in the Floorpan Editor inside of ISE. When floating the mouse over I/Os you know which bank an I/O is in, and by the I/O's location you know whether that I/O and the others in the same bank are located left, right, or center. Another way to figure this out is by using the Package view in the Floorplan Editor and kinda seeing the groups of I/Os and where they are located on the package. Make sure to look at the top view so that the left I/Os are actually on the left instead of right, and vice versa.
Sunday, April 12, 2009
Xilinx ISE vs batch mode (command line / makefile)
I don't know. I truly loathe the ISE and since most Xilinx tools produce output that seems (at first glance) more suited towards command line I feel that command line should be better. It can't be worse. Of course this means I have to figure out how to work in the command line. I've now installed cygwin to use the *nix (Linux, Unix) tools and makefile system. Since I never remember how to write makefiles (it's been a few years), I'm busy experimenting and getting frustrated all over again. Tabs (as in the \t or 4 or so spaces thingy) makes a difference in makefiles. And GViM doesn't identify or tabify makefiles properly. This is all very frustrating. My nerves were twitching quite a bit by the end of the day. I think I will try to work within the ISE tomorrow and perhaps during compilations see what I can figure out about command line or batch mode compilation.
SDRAM AL (Additive Latency)
So I've got to implement a SODIMM interface with Xilinx's MIG (Memory Interface Generator). One thing I've learned today is the very interesting use of the AL (Additive Latency) for SDRAM chips. Additive latency allows for commands to be issued to the SDRAM chip and have the SDRAM chip buffer the commands for AL (1,2,3,4, or 5) clocks depending on configuration. This is useful when the time between Activate (SDRAM command) and Read (SDRAM command) (tRCD) will cause a collision between the next Activate and the optimal clock for the Read to be issued on. See http://download.micron.com/pdf/technotes/ddr2/TN4710.pdf for a very good explanation of AL.
Saturday, April 11, 2009
The Design and Debugging Stages
Designs have the natural beauty of logic. This means for every design there is always a "perfect" way to implement it. Believe it - it's true.
Every bug has a reason; In other words a basis in logic. Whether the bug is due to bad hardware / peripherals, or if it's because of a bad compiler there is always a reason. Every step of the programming process from the designing to the running can be explained. Bugs are usually the product of an error in the design or implementation, and rarely a problem with the tools or hardware - although nothing should be discounted when attempting to solve a bug.
I'm an amazing debugger. I use very few tools, and I do most of my debugging manually - but I almost always get to the bottom of things. The only reason why I can do this is because of my strict belief in logic, and perhaps the massive amount of experience that I have in debugging.
Be very suspicious of other people's code, and be very suspicious of APIs. This has served me extremely well over the years. This advice must be taken with care. For example it is obvious that most Win32 API calls are gonna work the way they should since they are used in countless systems since whenever Win32 API came out - a while ago. But if you are using the API from a very customized chip maker and the chip / API is new - then be suspicious. Any time you want to suspect an API you must be able to explain at least in theory how that API is only causing you a problem although it's working for other companies / users of the same API.
Other people's code is always a source of friction. Most people take pride in their code and don't like when you look at it suspicously. I'll say this "If the system isn't working then the first step is to look at the code as if it's 100% the problem." Don't listen when people say "The bug isn't in my code." Everyone believes that and someone's wrong. Take this advice with care. We all know that some people are more prone to bugs then others. And that should affect which code you are more suspicious of to begin with.
Every bug has a reason; In other words a basis in logic. Whether the bug is due to bad hardware / peripherals, or if it's because of a bad compiler there is always a reason. Every step of the programming process from the designing to the running can be explained. Bugs are usually the product of an error in the design or implementation, and rarely a problem with the tools or hardware - although nothing should be discounted when attempting to solve a bug.
I'm an amazing debugger. I use very few tools, and I do most of my debugging manually - but I almost always get to the bottom of things. The only reason why I can do this is because of my strict belief in logic, and perhaps the massive amount of experience that I have in debugging.
Be very suspicious of other people's code, and be very suspicious of APIs. This has served me extremely well over the years. This advice must be taken with care. For example it is obvious that most Win32 API calls are gonna work the way they should since they are used in countless systems since whenever Win32 API came out - a while ago. But if you are using the API from a very customized chip maker and the chip / API is new - then be suspicious. Any time you want to suspect an API you must be able to explain at least in theory how that API is only causing you a problem although it's working for other companies / users of the same API.
Other people's code is always a source of friction. Most people take pride in their code and don't like when you look at it suspicously. I'll say this "If the system isn't working then the first step is to look at the code as if it's 100% the problem." Don't listen when people say "The bug isn't in my code." Everyone believes that and someone's wrong. Take this advice with care. We all know that some people are more prone to bugs then others. And that should affect which code you are more suspicious of to begin with.
my history
During my tenure at Mango DSP Ltd (1/2001 - 7/2008), I began as a C++ developer working on basic GUI software. I moved onto developing C code for Analog Devices DSPs with interfaces to Altera FPGAs. As time went on I rewrote all the Windows drivers bringing support to Windows 2000/XP with WDM. I was promoted to Manager of Infrastructures taking on the responsibility for low-level DSP and host libraries. In 2004 I began to venture into FPGA design. I rewrote Mango's complete low-level FPGA infrastructure code bringing down compile times to a fraction of what they previously were, while at the same time significantly lowering resource usage. I was given the additional title as Senior FPGA Engineer at Mango DSP Ltd. I rewrote complete Altera Stratix FPGA designs for image processing systems in a very short time with successful results. I wrote Lattice FPGAs both for Mango DSP Ltd., and for my next employer VisionMap Ltd (8/2008 - 9/2009). My final project at VisionMap Ltd. was a high-bandwidth high-speed Xilinx Virtex 5 design. I built a complete scatter gather DMA engine core for usage with Xilinx's Endpoint Block Plus for PCI Express. I converted an algorithm written in C to a pipelined implementation in the FPGA. My design uses over 400 DSP48E blocks (instantiated manually), and over 250 BRAMs (also instantiated manually). I wrote the accompanying WDM scatter gather driver for testing and final operation.
I have decided to return to the US after 9 years in Israel and I am looking for new challenges and learning experiences there.
I have decided to return to the US after 9 years in Israel and I am looking for new challenges and learning experiences there.
Subscribe to:
Posts (Atom)