I for one, would not mind one bit if this patch was deployed a little latter, if that delay would assure you have ample time to perform more exhaustive testing at lower personal stress level. There is a lot of hardware combinations and a lot of software that should be tested after the CPLD is patched.
The good news is that these tests are limited to ACA500plus, ACA500 and A1200. The even better news is that in 66% of the cases, I'm testing against a system that I have the source code to.
Taguchi method could be used to reduce the number of actual tests, reducing the labour burden.
You generally have two methods of getting quality: You either test it into a product, or you produce it at the fundamentals. I see a lot of hobby projects that do lots of testing, but what I have never seen is any discussion about fundamentals like "is timing XY from this datasheet met?", yet this is the basic question that I've answered in a huge LO spreadsheet that actually follows each timing path. This is of course manually made, with the possibility of missing a path (can't simulate it all with the XIlinx tool, as there's quite a lot of discretes on the ACA1234), but I have verified each and every timing on the logic analyzer for plausibility. I do admit that I can't tell in full accuracy if a timing is met, as my Agilent 54622D only has a resolution of 2.5ns, so there are timings such as command/address setup to rising clock edge for RAMs (1.5ns requirement) that include some guesswork. However, with a worst-case clock cycle of 12.5ns (80MHz), there's sufficient space to add slack in all directions. So although I don't call it the Taguchi method, my way is pretty close to it, as I know each and every critical path, I know how (and why) to keep it short, and I have enough experience to not entangle too many paths.
Still, this development was the final push to ordering a new toy, which will hopefully arrive before the end of the year: A Rohde&Schwarz RTM3004 with all the bells&whitsles, full bandwidth licensed and the added hardware for 16-channel logic analysis with 5GSamples/s.
The sad part is that even if I would have had this toy for this development, I wouldn't have caught this error, as I've made the (in retrospect rather stupid) assumption that things I have already successfully tested don't need to be tested again. The ID for the difference between "development card" and "production card" is in a totally different area of the CPLD - the one that communicates with the MCU. Yes, I've had the Xilinx tool play tricks on me in the past, especially on the ACA500plus which has a similarly high utilization of the parts, pushing the fitter to the limit, requiring it to deal with lots of manual placement (otherwise my designs don't fit at all).
With my LO spreadsheet taking the fitter output and timing analysis of the Xilinx ISE tool as input, I was lucky enough to be familiar with the equations output of the thing - and immediately spotted that the offending Ram address line in the fitter output only had two product terms, where it should have three: One for the "init" case, one for normal A21 passing, and one for trapdoor memory mapping. So my overall development method did help finding this fairly quickly.
Testing should - wherever possible - be automated. We've done that very extensively for Chameleon. However, this being the 7th generation of 68030 accelerators that I design, each of them using the same trapdoor Ram mapping method, made me go too sloppy and I focused on the wrong things - like adding my "negative heat sink" (styrofoam on both sides of the card) and run load tests on memory-intensive operations with MapROM and burst switched on, and of course the typical "generate a few AIBB modules in a row", as popular wisdom of the Amiga underground believes that only the second successful AIBB module (aka no crash) tells that an accelerator can be considered "stable".
Timm and I have worked on the CPLD update tool today - it's of course based on the update tool that we've made for the first 2018 Buddha controllers, but has a few added safety features like automated re-try, warning text and keyboard-free operation. Successfully tested with A1200 and ACA500plus, but not yet with ACA500.
Weekend is family time, so I can only promise to get back to this on Monday. If anyone of you still has the parallel port-JTAG cable, or is willing to make one, I may sneak into the office tomorrow night after I took the kids to bed, do the ACA500 test and if it's successful, upload the archive here. I don't expect any trouble, but this is a lesson in testing. I should show that I've learned it.