Posts by thebajaguy

Caution: Non registered users only see threads and messages in the currently selected language, which is determined by their browser. Please create an account and log in to see all content by default. This is a limitation of the forum software.


Also users that are not logged in can not create new threads. This is a, unfortunately needed, counter measure against spam. Please create an account and log in to start new threads.

Don't Panic. Please wash hands.

    I'm Robert Miranda on FB. I did see the note about the interest on Amibay, too. I'm currently traveling for work, so kind of limited in my Amiga side efforts.


    As far as the GuruROM, there's nothing in it that is different than the FastROM v4.x series driver when it comes to the GVP series II cards operating in Async SCSI mode. The bump to Sync SCSI will help performance if other optimal system factors (described in the Amibay thread) are present. It does help address the known 24-bit->32-bit memory transfer limitations in the C= A2091/A590 scsi.device driver, and it's iffy reselection handling (multi-device situations) to equal the GVP. Beyond that, if there are instability issues present on the SCSI bus, pushing communications speeds to Sync SCSI will trip over them even quicker.

    With the CPU copy-up necessary for the high memory on the accelerator, I caution expecting any notable performance improvement with Sync SCSI. Async SCSI at it's best pushes 2.2-2.3MB/sec to it's DMA-capable buffer, but the copy-up effort will generally knock that overall performance down to ~1.5-1.7MB/sec at best (about 1/2 Zorro II bus speed).

    On any given day, in a stock 68K system (24-bit), the DMA controllers will out-perform the CPU-driven I/O cards (SCSI or IDE). The HC8, with it's interleaved CPU/DMA 14MHz on-card memory will be the best multitasker, leaving the CPU free to run while DMA happens for I/O. Both controllers can, with Sync SCSI and a willing device, effectively saturate the Z2 bus, and the CPU-driven interfaces cannot, in terms of I/O speeds. For the record, in a best-case speed race for only I/O and no care for how much the CPU can't run, the A2091 is a hair better than the HC8 in this classic system configuration. It's 33C93A-DMAC buffer runs more optimally than the 33C93A-DPRC's does. The 33C93A technically DMA's into the receiving port on those DMA controller chips, and the 16-bit DMA transfer happens to/from the memory destination.


    Move these two DMA cards up to the accelerator-driven systems with both DMA-capable Z2 AutoConfig 32-bit RAM and non-DMA-capable high RAM, and you incur all of the CPU-lockout benefits and issues inherent with Z2 bus masters plus the need to copy-up that data at times to the upper address ranges. Halving the Z2 bus performance-wise is where your I/O speed limit comes from as the data has to move twice. The argument for moving onto CPU-driven I/O gains good traction since the CPU has to be involved anyway. The 32-bit memory performance, if good and the Instruction cache is running the copy-up code, can get some good numbers off the Z2 bus (for IDE). This is where the Buddha has a suitable place in the field. There's still the 3.4-3.5MB/sec speed limit in Z2, though, and one can't hit all of the optimal access windows all the time. Sync-clocked CPU-slot cards will of course do better in that, but they are rare in the A2000 (A2620 being one, old RONIN 030 cards being another). In an odd case, the DPRC 16-bit FastRAM (when none is low-mapped on the 32-bit accessible accelerator cards - n/a to this A2630 memory configuration) doesn't do the CPU-block due to DMA, but the CPU copy loop is still tapping the slower 7MHz 16-bit RAM for DMA buffering. It's just a bit more CPU, and therefore multitasking-friendly.


    Then we get into the A2000 accelerator combos with the onboard SCSI (or IDE) and DMA options. The GVP 030/040 cards with DPRC on them are admittedly more like the A2091 without the nice interleaved memory access design, but they pull 2-3MB/sec in optimal cases, and there's a hack on the GF040 in a special case where I've seen over 4.5MB/sec. Move up to the mid-1990's 040/060 class accelerators with the 53C7xx SCSI/DMA chips on them, and you are back again in favor of the DMA cards, in both I/O performance and CPU-friendly multitasking. The Blizzard and TekMagic cards are among this group, and are lightning-fast in both CPU power and I/O. I've actually been working on an A2000 system with the Buddha and the Blizzard 060 (to copy off data as I look to upgrade it from 3.1 w/some 3.5/3.9-era additions to 3.2.1 ROM/Workbench), and it's no slouch in I/O with that kind of CPU power. It's just not the most optimal. The buddha is mine in this equation, and so is just a temporary backup point, but if the guy didn't have tape, removable media, and several drives in an external tower, the Buddha might be a modern answer, too.


    I left out the 68000-680x0 accelerators now showing up with IDE interfaces on them (and RAM), but must acknowledge them. They still use the CPU to effect transfers. My brother wrote back in he early 1990's, and I updated with a partner more recently, the RSCP benchmark. It helps show what impact I/O activity (DMA or CPU-driven transfers) can have on multitasking performance (CPU free time) when a system is 'under load'.

    The answers, as I noted, are never straightforward ;-)

    Taking the 40MHz plastic-encased CPU past ~10% overclock is starting to approach the design limit of the chip package - And by that, I mean it's ability to dissipate heat from the internals of the chip. I have known some to succeed, so it's not impossible. It's a YMMV situation. What Jens is suggesting is to make sure it's stable at higher speeds in the long-term. This means closing up the unit and letting it run something fairly 'busy' for awhile so that the entire case can heat up inside and not affect the stability of the CPU.

    The SCSI2SD v5.x series will do about 1.25MB/sec on any of the controllers you have in a stock system, as it supports Async SCSI transfers only. . For that, stick with the existing driver. I think once you get a feel for the flash-based disk, you will notice it's faster than you remember disk being as there is no seek or spindle latency on that kind of device.


    No problem on taking time to work things out. If you set up the SCSI2SD with multiple 'drives' with the provided tool (online download, program it off your modern PC), at multiple SCSI IDs, you can define each 'drive' as <4GB, then use the GVP FastPrep (or ExpertPrep) tool to further partition each disk into one more more actual partitions. Remember that OS 3.1 has disk capacity limits of <4GB and partition sizes <2GB.

    The rule on the street I have always heard is to plug in an actual USB hub (with external power) to the board's interface. Use the hub for situations where hot plugging of devices is needed, or where power draw may be higher. Use the main USB connection point for low power connections and those you don't need to hot-plug.

    For a 68030 @ 25MHz, it's not too different, but higher performance CPUs can get a bit more out of RAMSEY memory.


    The cache also skews the benchmark a little, with sequential activity generally rewarded, and random activities penalized to various amounts.

    For the FastROM (a marketing name...sigh), just have the card in with the J4 jumper enabled and a ROM in the socket (@ top edge). Once it boots (can even be from a Workbench floppy with no HD, just wait for the timeout), open a CLI/Newshell and type:


    version gvpscsi.device


    What comes back is your (ROM) driver version. For your application, there is no difference between a v3.14-3.15 and the 4.x series - and all are compatible. Any changes were for later products and unrelated situations.


    The DMA Mask and MaxTransfer are settings made in the HD prep tool used to partition the hard disk. I recommend using the tool you last used to prep the HD to make the modifications. I also recommend having any data on the target device you value backed up as human mistakes could render the filesystem inaccessible, and you may be reformatting and installing clean again. When the time comes, and you know the tool, we can walk thru the process. If you have a blank drive/media to work with, it may be helpful to tinker with that, without your data at risk.


    GuruROM v6 is a ROM on an adapter PCB sized as a ROM. It's code is a more advanced cousin of the gvpscsi.device, and larger, yet by the same author. It's actually named omniscsi.device, and supports both the A2091 (DMAC-02) and the GVP Series II (DPRC) DMA chips on those products. Both boards are based on the 33C93A SCSI chip. The added feature in the driver (from the base of gvpscsi.device 4.15) is the support of Sync SCSI on the 33C93A -04 thru -08 chip. It requires 14MHz clocking of the 33C93A, which is the standard setting on most HC8 cards shipped with v3.14 or later FastROM. It requires a trace cut and jumper made on J201 of the A2091 to enable 14MHz clocking support, and a jumper added to one of the Options pins. It will run in 7MHz clock mode, but doesn't support Sync SCSI, and the overall SCSI bus performance potential is a bit less (no worse than C= 7.0 ROMs on a stock A2091).


    Depending on what performance we get out of the GVP HC8 DMA transfers to the A2630 32-bit RAM with your SCSI device will be the deciding factor regarding the value of getting the GuruROM. If it's below a certain performance level, there's no value for it on the HC8 as gvpscsi.device supports the copy-up buffering anyway. It will help the A2091 with the A2630 BigRAM copy-up to BigRAM performance. Finally, the Sync SCSI boost isn't always supported by the disk devices being used, or it gains nothing due to the speed of the target SCSI device's media. In both HC8 and A2091 cases, the added performance of Sync speeds over Async is of no real value with the buffering activity to the high-mapped memory reducing the theoretical speeds possible by about 1/2.

    That's an interesting comparison to my 25MHz 68030 A3000D with a Rev 11 Buster:


    Ramsey-04 / SCRAM

    BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768

    ========================================================================

    memtype addr op cycle calib bandwidth

    user $48000000 readw 249.8 ns normal 8.0 * 10^6 byte/s

    user $48000000 readl 335.4 ns normal 11.9 * 10^6 byte/s

    user $48000000 readm 309.4 ns normal 12.9 * 10^6 byte/s

    user $48000000 writew 251.2 ns normal 8.0 * 10^6 byte/s

    user $48000000 writel 253.8 ns normal 15.8 * 10^6 byte/s

    user $48000000 writem 227.7 ns normal 17.6 * 10^6 byte/s



    66MHz Clocked BigRAM Z3

    BusSpeedTest 0.19 (mlelstv) Buffer: 262144 Bytes, Alignment: 32768

    ========================================================================

    memtype addr op cycle calib bandwidth

    user $48000000 readw 307.7 ns normal 6.5 * 10^6 byte/s

    user $48000000 readl 454.3 ns normal 8.8 * 10^6 byte/s

    user $48000000 readm 430.6 ns normal 9.3 * 10^6 byte/s

    user $48000000 writew 410.6 ns normal 4.9 * 10^6 byte/s

    user $48000000 writel 414.3 ns normal 9.7 * 10^6 byte/s

    user $48000000 writem 376.8 ns normal 10.6 * 10^6 byte/s


    Not as much variation with the Buster chips as I would have thought.


    Jens is correct to not bother with adding RAM on the HC8 (or the A2091) - it will be slower memory in your system configuration (with the A2630 AutoConfig 32-bit RAM and the BigRAM), and won't help with I/O performance.

    Using the GVP HC8, a FastROM driver of v3.12+ or 4.x on the card will know to buffer and CPU copy-up using the DMA-capable RAM on the A2630 - you will want some RAM on the A2630, be it 2MB or 4MB. Jens' guidance is certainly best here.

    Next - Just like the GuruROM setup, set your filesystem partition values for DMA Mask with that card/driver to 0x7FFFFFFE and make sure the MaxTransfer is set to 0xFFFFFFFF. The gvpscsi driver will then interpret any transfer to/from an address on the BigRAM address space (>16MB Address line) as needing to be buffered, and will handle it automatically. If the mask value is smaller (0x00FFFFFF - a typical HDToolBox default on the Advanced page), it will hide the source/destination address from the driver. The filesystem will then try to do the buffering in much smaller pieces, and it's highly inefficient. You would have to use the smaller DMA mask with the A2091 until a GuruROM could be used.

    As far as reliability, I agree with Jens - both cards have been solid in today's aging equipment spaces. My lean toward the GVP HC8 is purely based on the lower number of both parts and which are socketed. Pin contacts can oxidize from natural exposure/dissimilar metal contact. PGA sockets can weaken from heating and cooling cycles and the heavy pressure the inserted chips exert.

    Aside from my past employ at GVP possibly exposing some bias, I can tell you that the same engineer designed both cards. I know that they used different approaches to the custom chip's featuring (DMAC vs DPRC), though.


    BTW - the GVP Series II cards (the HC8/HC+8) for the A2000's Zorro II slots use pairs of common > 1MBx8 30-pin SIMMs <. They do not use GVP custom SIMMs. The custom 64-pin GVP SIMM32s are used on the GVP accelerators (RAM32, Combo/G-Force).

    Back in the day, I remember GVP sold plenty of 33/40/50MHz 68030/68882 accelerator cards for those looking to 3D render w/the fastest FPU options. Later, 68040 cards came on the market (~1991-2 or so), and the rendering software houses recompiled their code - further optimized for it's reduced inline superset FPU instructions. After C='s demise, the 68060's eventually showed up, and rendering performance went even higher.


    It's my opinion that the presence of the FPU on the C= board will either be fine, or it won't, with the BigRAM on it at any clocking. The FPU's clock rate is asynchronous to the CPU/bus clock. It's the mere load of the additional circuitry that is the greatest factor.

    It's a curiosity of mine, but if you can, grab bustest off Aminet, and run it against an address somewhere midway on the BigRAM card address space (use SysInfo to ID the board's memory address range, probably starting around $0400.0000). Buster 7 is supposed to be slower than the -9 or -11. for Zorro III activities, but I've never seen test results before I updated the host to Buster -11 on. Its never going to be as fast as motherboard Ramsey memory (which natively gets consumed first).

    I have some tinkering going on myself inside of a GVP A530 case, which is similar in concept, but designed 30+ years ago. It was made to handle a 1" 3.5" spinning drive and a 68030 running at 40-50MHz, plus TTL board logic with PALs. Maybe what I remember from then can help.


    The design pulled in air from the front via the base vents (1.5cm bend up of metal material, vertical) and exhausted it at the rear of the unit with a 12v fan. Ribbon cables were left at the rear where they might helpfully obstruct airflow leaking out the rear DB25 or DIN power plug ports (also in the base, 2cm tall at that side). The case had an external PSU for 12V power for the disk and fan. The system also had an option for a vertical slot card with logic (more heat from a 80286-16 and 512K DRAM) on the one side. This was the same ergonomic case designed for the Series II HD8. I remember the engineer checking things before it was green-lighted.

    Consider cable the routes and overall cable management. This was always the challenge in the 1990's and 2000's PC case / motherboard customizations.

    My current interest sale thread for the GuruROM is on Aminet. I have a partial batch built (~40), and the parts to build more to match the current request list are on hand. I hope to have them ready this spring to address the currently interest group.

    In the meantime, the slow, but still useful for filesystem operations mask of 0x00FFFFFE on each partition will have to do for now. With GuruROM, that mask gets opened up to 0x7FFFFFFE (or any value which includes the related upper address lines of the memory mapped >16MB).


    A side note that a Filesystem mask does not solve non-FFS activities where controller board DMA could occur. i.e. - Other filesystems using the SCSI interface must also be programmed for it - individually. Also, anything that doesn't use a filesystem must manually address it somehow internally - SCSI scanners, SCSI Tape backup tools, etc. (hence - my stance that the DMA Mask is a hack - the driver should always handle it - and never shall a DMA Mask ever be needed again).