Atari ST core

Caution: Non registered users only see threads and messages in the currently selected language, which is determined by their browser. Please create an account and log in to see all content by default. This is a limitation of the forum software.


Also users that are not logged in can not create new threads. This is a, unfortunately needed, counter measure against spam. Please create an account and log in to start new threads.

Don't Panic. Please wash hands.
  • Hehe, it still took some dozens of hours of debugging all the minor requirements of the flags and behavior of the WD1772

    Sure - the details are always what keeps you from that magical 100%-mark of compatibility. I didn't expect the ST to be any different from the Commodore computers that we've done so far.

    The file format which holds the MFM stuff ("flux") is the SCP - do you know it's average size is 20-30MB? Doesn't fit into the SDRAM, and I'm not sure if streaming from the SD Card can give adequate latency for copy-protections.

    I am not looking for the flux level, as that really produces too much data (Catweasel produces about 50k of data for every track of a DD-disk, so around 8MBytes per DD-disk). Instead, I am thinking of storing the actual MFM bits, which is about twice as much as the decoded bits. In addition to that, I'd divide the track into 400 (or maybe 512 for ease of addressing) zones for which you can store the bit rate of the block in question.


    This should be sufficient for most (if not all) copy protections, as you can create the "weak bits" for any sequence of "too many 0s in a row" with some additional code. You can of course go complicated and store 5 different reads of every track in order to record what the actual floppy made out of those weak bits, but that's depending on the floppy itself and will vary from real machine to real machine. So why bother? Just add a probability of inserting a 1, and increase that with every 0 in the stream (and of course reset that probability to 0 for every "1" in the source stream). Let a noise generator add some jitter to that inserted-1, and you should have sufficiently-different data on every read.


    Let's say we're on 500kBits/s for DD disks, this means you have 100kBits of raw MFM data per track and side (given 5 rotations per second=300 RPM), resulting in a very managable 12.5k/track or 2MBytes per image. Adding 512 bytes of timing information per track only adds 80k, which is absolutely within the limits of what both Chameleon and MiST can handle. Not sure how big your 1772 core is ATM, but it may be worth integrating that if you create a microcode engine, so the total size comes down.

  • working on a bitstream rather than flux should indeed be good enough for most things - most of the time you will have simple things like "a sector exists twice" or so anyway.

  • So you basically suggest an intermediate file format, which can be created offline from existing STX dumps. It's probably doable, and could be a good approach. And boards with more powerful MCUs can create this intermediate file on the fly.

  • and could be a good approach.

    Yes, you can trust Tobias and me about this: I did the hardware of the Catweasel (first model back in 1995), lots of research about old drives and formats, and Tobias implemented the Imagetool back then. We're the two people within iComp that probably know so much about floppies, formats, tolerances and quirks of the mechanics that very few people on the planet can challenge us. I dare to say that those who can keep up with our experience are the ones who were involved in further drivers (shoutout to Achim and Michael, if you're reading here). I understand that this may come across somewhat arrogant. Please consider the possibility of this being plain confidence.


    And boards with more powerful MCUs can create this intermediate file on the fly.

    Why "on the fly"? You can do this while loading, i.e. while the user is waiting. However, it may be a real pain to go from flux level to a bit stream, and determine the average bit rate while doing so - all on an MCU. So yes, this should probably be written in a high-level language and run on a modern desktop. The amount of flux-level images that actually need to be one is likely to be finite, so you can just convert an archive once, and enjoy the vast amount of data reduction.

  • Quote

    The amount of flux-level images that actually need to be one is likely to be finite

    Probably close to zero even :)

  • "need to be" in terms of "can't work as a sector-level image".


    Let's wait for robinsonb5 to comment on available CPU/MCU time of his menu engine, and maybe for gyurco to come up with a first draft for a format. If we all put some brains into this, we may be able to make something that can also work on the Amiga core. My goal would be to have a fully open format that CAPS images can also be converted into. Those STX or SCP images seem to be close to what CAPS is recording, and that is "way too much" - classic over-engineering.

  • Patrick: Just curious, that does mouse work in your setup with ST-core? You seem to use TC64v1 standalone. I do too and mouse doesn't respond in my setup. It works with other c64&minimig, but not with ST.

    did not have any problem with the mouse. I use an very old Microsoft mouse (with a ball but it works) also use in when running the Amiga core. Can try to connect another one to test is.

  • Changing the memory size should work, but the ROM only checks for the RAM size on a cold-boot - so a soft-reset isn't enough after changing the memory size. You have to trigger a hard-reset instead.


    To trigger a hard reset, highlight the Reset menu option, and instead of tapping Enter, hold it for a second or two, until the core resets.

    The extra memory should then be recognised.

    thanks will try that, did not think of that

  • Alastair: That's good, and core is already in very good shape. Have been able to play a bunch of ST-games already :)

    There is a few programs with which it is possible to convert MSA to ST-files. I converted whole automation-collection with MSA Converter v2.1 (by Zorg). Probably there is no way to convert from STX to ST, because STX has copyprotection infos etc.

    Could you try this version, please? (If anyone else wants to try it, please note it's for V1 hardware. A v2 version will follow if it turns out to solve the problem.)

  • Let's wait for robinsonb5 to comment on available CPU/MCU time of his menu engine,

    The 832 CPU and control module is currently clocked at 50MHz, and manages a shade under 7 DMIPS. It's deliberately clocked at well under its theoretical maximum speed to avoid adding timing pressure to the guest cores.

    (The goal with the CPU was always to be small, both in terms of logic and code density, rather than to be fast. Because of that, it's not well suited to decompression workloads. Shifting, in particular, is slow - the usual optimisation of replacing multiplies with shifts applies in reverse here!)


    Currently the host CPU has "custody" of the SD card, so it's responsible for loading the ROM at bootup and for handing data on-request to the disk emulation. It's also responsible for passing joystick events to the core. With some cores it handles keyboard events, too - but MiSTery has enough provision for PS/2 devices that I've given it direct access to the physical keyboard and mouse.


    My goal with most of the MiST cores I've ported to TC64 has been to do it as non-invasively as possible - to avoid modifying the upstream core any more than strictly necessary, so that it's easier to keep in sync with upstream changes, and so it's possible to merge the codebases in the longer term. (Gyurco and I have already collaborated closely with both Minimig and TurboGrafx16, and both of those now have a shared repo for MiST and Chameleon64.)


    Making the DeMiSTify wrapper as transparent as possible to the guest core means keeping the firmware as small and streamlined as it can be, since a lot of the cores use up most of the FPGA's block RAM.


    For most of the cores so far the firmware, working RAM and stack have fit within just 12k. (TurboGrafx16 needed 24k because of the CD image support and bin/cue parsing.) The port of MiSTery currently needs a shade under 14k - but that's likely to be nearer 20 once hard drive images and the C64 keyboard are supported.


    There's currently enough free block RAM in this core for the firmware to grow to 30k so there's plenty of headroom at the moment. (I'm reluctant to extend into SDRAM since it means more invasive changes and slower execution - but it's nonetheless possible: Minimig already does it that way, and I believe it will be necessary for the Archimedes core.)


    As for a new floppy image format, provided there's a straightforward way of mapping from track and sector to file offset I don't see it presenting too many difficulties for the firmware. If large decompression buffers or index tables are needed then it'd be a different story.

  • Alastair: Unfortunately didn't solve problem, still doesn't work with Logitech PS/2 optical mouse. Then I tried with older HP PS/2 optical scroll mouse, and that works at least with newest core. HP-mouse works with Minimig too, and most probably with C64-core. I'm a bit loss now, what is different with Logitech-mouse, why it isn't responding with ST-cores.

    Patrick: Thanks for confirming :)

  • As for a new floppy image format, provided there's a straightforward way of mapping from track and sector to file offset I don't see it presenting too many difficulties for the firmware. If large decompression buffers or index tables are needed then it'd be a different story.

    That's about what I thought, but viewed from a different angle: You're thinking track/sector, but I'm thinking track/rotational position. This will require a timer that generates the bit rate of the stream that goes to the 1772 emulation. So far, I have given the following (arbitrary) parameters to think of:


    • rotational speed fixed at 300RPM = 200ms per rotation
    • +/-10% deviation from standard bit rate. We may be able to reduce this, as the long-tracker hardware that I've built back when I developed copy protection of Apidya (an Amiga title), I used a 30MHz crystal for clocking the chip set. This is just shy of 6% increase, and we chose it because it was the limit of what we could "safely" do with all test machines available.
    • division of a track into 0.5ms snippets (roughly)

    The underlying idea would be to make a 32-byte snippet represent a fixed amount of time (near 0.5ms), but a variable amount of data. Let's make that:


    1 byte bit rate information

    1 byte # of valid bits

    30 bytes of stream data


    If we make 225 bits per snippet the standard bit rate, then 445 snippets would contain max. 100125 bits of information. With a timer running at 0.449ms pace, this would build up a track-to-track deviation way smaller than any physical drive will do, even with a fixed amount of bits per snippet. If the image is built in a way that every other snippet has 224/225 valid bits, you have almost no deviation.


    Timing: Let's assume we run the engine at 50MHz (again, arbitrary, but the clock domain appears to be there already), then the snippet timer would have to count to 22472 (or from 0 to 22471). This snippet timer would be started/stopped with the motor bit. Index would be set at snippet 0, and reset with a bit rate information byte of 255, which has a bit rate meaning of "leave unchanged from previous snippet". In addition to that, index should always be reset for any snippet over 17 (index is specified to be between 1ms and 8ms) - this would just be "good measure" to deal with possibly-bad image data. I have my doubts that the length of the index signal will be used for anything useful, as it's highly dependent on the drive, not the disk.


    Since the bit rate should be derived from the same source clock, but we don't actually need to create a physical bit clock (just a rough average bit rate), I suggest inserting a delay after two bits of stream data. For an odd number of bits in a snippet, the last bit is carried over to the next snippet, where the new delay value will be used. This will make the "magic timing value" at the beginning of each snippet a number of 50MHz clocks to delay after every other data bit of the stream, and the value 198 (assuming two clocks are used for shifting the two data bits) is standard for 500kBits/s bit rate. This is making good use of the 8 bits, and leaves lots of space for speed deviation, but also good-enough granularity.


    This should be simple enough to implement it in VHDL, but flexible enough to cover all cases - definitely the read-only cases. Writing is a whole different story, which I'd postpone for now, as that feels like some CPU power will be required.


    Size requirement of this suggestion:


    445 snippets times 32 bytes times 160 tracks = 2225 kBytes. This is small enough to be generous with additional gaps that make track/timing poition calculation easier (like 512 snippet-distance from track-to-track, resulting in 2560 kBytes space requirement).


    Once again, this is just a suggestion; a basis for discussion.

  • still doesn't work with Logitech PS/2 optical mouse. Then I tried with older HP PS/2 optical scroll mouse, and that works at least with newest core.

    PS2 mice are a bitch to initialize. Here's a few hints:


    - some mice require communication-inhibit of more then 75µs before sending a byte

    - wait for BAT at startup with a timeout of 3.5 seconds. Yes, some mice require that long, but many KVM switches also do. Some don't send BAT without a reset, some do, so timeout is important.

    - do implement a reset routine! Wait for BAT again after resetting. Go back to attempting a reset if BAT still doesn't come.

    - do use the knocking sequence. Most mice are tested against Windows only, and that always sends the knocking sequence.

    - only activate stream mode after performing knocking sequence. Activating stream mode too early confuses some mice.

    - do not sample data too close to a clock, as long and cheap cables may have severe crosstalk. I use a 3µs wait before sampling.

  • Quote

    Yes, you can trust Tobias and me about this: I did the hardware of the Catweasel (first model back in 1995), lots of research about old drives and formats, and Tobias implemented the Imagetool back then. We're the two people within iComp that probably know so much about floppies, formats, tolerances and quirks of the mechanics that very few people on the planet can challenge us. I dare to say that those who can keep up with our experience are the ones who were involved in further drivers (shoutout to Achim and Michael, if you're reading here). I understand that this may come across somewhat arrogant. Please consider the possibility of this being plain confidence.

    I must trust you, since I didn't really involved anything below the level what the CPU can see from the floppy. There are some formats, which abstracts the low-level details to such level that it's enough for all copy protections (namely EDSK for the Amstrad CPC), but the parser on FPGA was a PITA to write.



    Quote

    That's about what I thought, but viewed from a different angle: You're thinking track/sector, but I'm thinking track/rotational position. This will require a timer that generates the bit rate of the stream that goes to the 1772 emulation. So far, I have given the following (arbitrary) parameters to think of:

    Maybe a track buffer on the FPGA side can be used, and one track can be stored fully in BRAM. This will make it trivial to implement it in the control CPU. 16 kB BRAM for a track buffer is available on these FPGAs. Maybe the problem is that it would be required for both sides.

  • and one track can be stored fully in BRAM.

    IMO, that would be a waste of blockram, as we're talking really low data rates here. Blockram is good for high-speed stuff (higher speed than SD-Ram), but this is a few ten kilobytes per second. I don't see a need for using such a precious resource for this task.

  • Quote

    IMO, that would be a waste of blockram, as we're talking really low data rates here. Blockram is good for high-speed stuff (higher speed than SD-Ram), but this is a few ten kilobytes per second. I don't see a need for using such a precious resource for this task.

    True, it's convenient to have a double sector buffer (in terms of SD-sectors), it just fits into 1 BRAM block, then a pipelining method could be used (process one buffer, fetch into the other). My experience (on MiST), that the firmware cannot really stream continuously without involving some waits between sectors (using single block read MMC commands) with hard disk use, but the slow floppy data rate should be OK.

  • True, it's convenient to have a double sector buffer (in terms of SD-sectors), it just fits into 1 BRAM block, then a pipelining method could be used (process one buffer, fetch into the other).

    I'm thinking even smaller: An SD-Ram burst is 4 words or 8 bytes. If this is the size that's requested from the SD-Ram controller, you only need to keep a buffer of 16 bytes, which easily fits in logic. You need such a buffer anyway, because you're going cross-clock-domain (SD-Ram will be faster than 50MHz).


    When changing tracks/sides, you can easily take a delay from the SD-Ram controller, because a snippet is so short (half a millisecond), while track-to-track is 3ms, and head settle time may even exceed 5ms. By the time any loader expects usable data after a track/head change, you have easily fetched the new data. I don't see blockram in this approach at all.


    One simplification would be to not have a requirement for carrying a single bit into handling the next snippet: In case the number of valid bits in a snippet is odd, you can just insert half the delay at the end, so every snippet is really handled in itself. Or even simpler: If there is no more valid data, don't wait a fixed amount, but wait for the 15-bit snippet timer to timeout, then go to the next snippet. You have *lots* of time at such a slow pace. And with the long track-to-track and head settle time, there is no need to ever interrupt handling of a snippet that's already started.


    Edit: Hmm... we seem to not exactly talk about the same thing: I am assuming that the whole floppy/MFM image has already been loaded to SD-Ram. You seem to want to load tracks/sectors on the fly from SD card. I don't see that working. We do need an SD-Ram buffer for the whole image. So please use the more precise terms "SD-Ram" and "SD-card" instead of just SD, so we know exactly what you're talking about :-)

  • Edit: Hmm... we seem to not exactly talk about the same thing: I am assuming that the whole floppy/MFM image has already been loaded to SD-Ram. You seem to want to load tracks/sectors on the fly from SD card. I don't see that working. We do need an SD-Ram buffer for the whole image. So please use the more precise terms "SD-Ram" and "SD-card" instead of just SD, so we know exactly what you're talking about

    I don't see why loading data on the fly from SD card shouldn't work (with a BRAM-based buffer / FIFO in between to cover the gaps while we read the next block from SD card). It's what we're currently doing, and we're not talking about a crazy increase in bandwidth here. (The same approach works for streaming CD audio in the TGFX16 core, for instance - though admittedly with a bigger buffer and much looser random access requirements).


    Given that the Amiga can have up to four drives, if each image is more than 2 megabytes in size we're certainly not going to want four entire images in SDRAM. A track at a time might be more practical - there should be plenty of time during a head step to fetch a new track's worth of data - or at least enough for the read process to begin.


    Thanks for the PS/2 tips - I am currently sending the knock before enabling streaming mode, but I'm not yet repeating the reset command if BAT doesn't arrive - that may well be the key. I'll have another go this evening after work.

  • I think track is the minimum if you want to reproduce stepping offsets etc properly without overcomplicating it... and indeed, the stepping time should be enough to read it from sd card.