During the Amiwest 2024 interview I have revealed all the specifications and features that we've kept secret all the time - some of them because they were not fully tested, others because I didn't want to give away the idea. Now that we're close to releasing the first boards for a lowered price (because mass-storage support will be added later), I'd like everyone to know about the full specifications, giving the ultimate answer about "what took you so long" 
Chip ram interface
Most, if not all accelerators are using the same known technique to handle access to the A1200 bus, and only release the accelerator's CPU bus when the access to the A1200 has been completed. For read accesses, this is of course the only way you can do it, but for write accesses, the ACA1240/1260 goes a new route: The 68040/68060 CPU sees zero-wait state speed on chip ram writes. From the CPU's point of view, the write access is completed at the first cycle that the CPU can accept that message. This means that program execution can continue while the accelerator's logic executes the chip ram write access simultaneously. This fixes a major problem that especially 68040 accelerators of the 1990s had, and that's chip ram access speed. For a good gaming experience in CPU-hungry games, you need high write-performance, and the same is true for demos, a form of digital art. Even the slowest 68040 at 25MHz gets 7MBytes/s write access speed to chip ram, where the 1990s 1240-25 cards struggled to reach 4.3MBytes per second. The maximum performance to chip ram is of course limited by the number of colours on your workbench, but the ACA1240/1260 will never waste a cycle, no matter what frequency the CPU runs at. When the ACA1240/1260 is connected to an ACA500plus, it will even split long-word accesses into two word-accesses while the CPU continues to work.
Power design
The next technical detail is the core voltage of the CPU, which is software-adjustable for over- and under-volting the CPU, either to get better overclocking success, or to reduce temperature of your setup. While the center voltage (5V or 3.3V) is set automatically according to the CPU type that the launch system found, there is a total of eight settings for fine-adjustment of that voltage: 68040 CPUs can be supplied with voltages from 4.6V to 5.47V, and 68060 CPUs can be fed with 3.0V to 3.88V.
Both the CPU core voltage and the main 3.3V memory/logic voltages are generated by very high-efficiency DC-DC converters as opposed to the linear regulators that were used in the 1990s. All power for the main logic is taken from the 12V rail, which has lots of power reserves, even with known-bad second-source power supplies (we still recommend a CA-PSU for any Amiga that has more than one expansion). Only the launch system of the ACA1240/1260 is powered from the 5V rail, but everything else is powered from 12V, reaching unparalleled stability over wide temperature ranges. There is still one linear regulator in the whole design, and that generates the core voltage of the main logic FPGA. There is no point in using a higher-efficiency unit in this spot, because the amount of power to be saved is minimal: The regulator does not even need a heat sink.
Memory and Logic choices
While other 680x0-based accelerators so far have been built with CPLDs, we decided to go for an FPGA with a few external components to do voltage conversion. This may sound expensive at first glance, and we've also used one CPLD in the data path to the A1200, but the choice becomes obvious when looking at further details of the design. First of all, there's memory, which determines the experienced speed of the accelerator: After all, the CPU can only execute instructions as fast as it can actually read them from memory. We decided to use SD-Ram for the ACA1240/1260, because that allows a faster overall system speed than seemingly-more modern architectures such as DDR memory of any generation. The reason is in the clock frequency that must be kept in tight tolerances on DDR memory, and multiple voltage translation steps that would be necessary to make DDRx memory work with a 68040 or 68060 CPU. With SD-Ram, it is possible to pass data through very fast voltage translators directly to the CPU, but for DDRx memory, all data would have to pass through the FPGA, which essentially slows things down.
An FPGA as the main memory controller allows lots of different strategies to be implemented, including adding second-level cache, which has not been shown on any 68040/68060 accelerator on the Amiga before. We've tried implementing that, but quickly found out why no other accelerator has added second-level cache yet: It just doesn't add enough performance. Instead, we've found that most speed is gained by "keeping rows open". SD-Ram is organized in rows and columns, and before you can access a cell, you need to open a row, which takes more than half of the access time. Keeping a row open can shave off several wait states, and the 256MBytes of memory can have a total of 16 open rows at any given time. Having many memory chips in the design is actually an advantage: SD-Rams require refresh cycles on a regular basis, and these can be moved to times when other memory chips are accessed. If the design only had two memory chips, refresh would give the processor unnecessary wait states.
For more compatibility, the FPGA can also stop the 68040/68060 CPU and let the CPU of the host computer do the work - memory and peripherals of the ACA1240/1260 can still be made available to that CPU, so at least fastmem can be provided.
The one thing that I have kept secret all over the development time is that the FPGA is wired up in a way that it can halt the 68040/68060 CPU, but also keep the CPU of the host computer off the bus. In that state, the FPGA takes over the A1200 bus and memory, and act as the CPU. We plan to use this for a high-compatibility mode with a cycle-exact 7MHz 68000 processor. The FPGA may be strong enough to provide 68020 performance when connected to an ACA500plus, but don't expect wonders - there's only 10k logic elements in the FPGA, so there's a limit to what we can do. Our focus with this is on compatibility with original A500 titles, not outperforming 680x0 CPUs.
As will all our FPGA-based products, the core can be updated in the field. This means if we find a way to improve memory speed, compatibility or just add a feature, it's only going to be a download for you, although the hardware behaviour will be different after that. It's the magic of FPGAs 
Mass storage
Here's where the main delay comes from: Using SD-cards in single-bit mode is a known technology, well-documented and already available as source code within iComp. However, performance is very limited in serial mode, so the micro SD-card slot is wired up for 4-bit secure digital mode. Interface verification has been done on a sample card, but so far, we didn't manage to make it work with all brands of cards. We therefore decided to start selling the boards with a discount, because mass-storage functionality isn't working yet. Rollout of the firmware/software updates will be two-step in order to have manageable&predictable chunks of work ahead of us: First we'll use the SD-card in single-bit mode to have something working, then we'll continue working on 4-bit mode to reach acceptable performance and increase compatibility with large capacity cards.
As with all our accelerators of the past years, the ACA1240/1260 also accelerates the IDE port of the A1200, which may already be enough to satisfy your need for speed on mass storage devices. Since we have so much more logic space in the FPGA compared to our other accelerators, this IDE accelerator has a much finer granularity than previous ones, and on write accesses, it uses the same acceleration as the chip ram interface mentioned above. We expect to have much higher compatibility and lower probability of data loss when accelerating the internal IDE port of the A1200.
Further expansion
On the outer edge of the ACA1240/1260, there's a high-speed connector for a graphics expansion. We've already shown pictures of our prototype that has VGA-only output. That's of course not what we'll offer - that prototye was only made to verify if the memory bus remains stable with an additional target on the bus. After all, the goal was to have unprecedented stability, so everything had to be fully tested before we release the hardware into the field. The final graphics card isn't available yet, but it'll share a lot of the IP that we're currently developing for Indivision ECS V4, the first-ever framebuffer and 2D accelerator with P96 driver from iComp. There will be a huge difference, though: Access speed to the frame buffer will be MUCH faster!
In-line with the high-speed connector is another female header which carries signals and voltages for an audio expansion, such as a sound card or an MP3 decoder. This connector may or may not be used in the future - I have added it to be able to offer such an expansion, and it's in line with the GFX connector to be able to cover both audio and video with the same expansion. I haven't really made a decision if I will use this connector or not, but I just had to add it in order to not regret it as a missed opportunity. Your feedback might tilt me in one or the other direction.
local flash rom
There's 8MBytes of flash memory on the ACA1240/1260. A large portion of that is used to keep all the different FPGA cores, because we need a separate core for each supported CPU speed (the memory controller needs to be tweaked for every CPU and frequency). However, one key thing that users of 68040 and 68060 accelerators had to do before they could even install the new card was to install a library. Not so with the ACA1240/1260! It comes with Thor's CPU and MMU libraries pre-installed in the flash, and there is no need to install libraries on your WB before you insert the accelerator. This is how we see the term "autoconfig" - true plug&play.
The launch system
I previous paragraphs, I have referenced the "launch system" as if you knew what that is. Sorry about that. Think of it as a general supervisor of the hardware: It detects the installed CPU, stores your configuration and license setup, handles flashing and de-bricking the board and launches the FPGA from the local flash. This is only done upon powering up the system, but work for that part of the design isn't over then. While the accelerator is running, it constandly reads the remperature sensor that's located in the CPU heat sink, and makes that data available to the main computer. It controls the fan speed and cross-checks the fan signals to find out if the fan really spins. Last not least, it detects if a 68040 or 68060 CPU is inserted and controls the core voltage accordingly. This launch system is fully transparent to the user - you'll only notice it when something goes wrong, in which case it will send blink codes on an LED, so you have some way of diagnosing the cause of the problem.
I hope this will shed some light on the question why this card was in development for so long, and why it still takes some time to be feature-complete. We're very close to a release, possibly next month, but surely in December with the unfinished points mentioned above. Let me know if you have questions - there's no more secrecy, I'll answer them all now!
Jens
Edit: The card of course has a clock port for an RTC module.