Interrupt free V-USB

Starting with V2.0, Micronucleus is going to use an interrupt free modification of the software USB implementation V-USB. This provides significant benefits for the bootloader, as it is not necessary anymore to patch the interrupt vector of the user program. A surprising side effect was a speed up of the V-USB data transmission,  which may also be helpful in other applications. Here, I try to give a rough overview about the meandering work that led to this achievement.

Previous versions of Micronucleus (and also the Trinket bootloader) use an ingenious mechanism devised by Louis of embedded creations to patch the interrupt vector transparently to the user program. Although this approach works very well, it still adds a lot of complexity to the bootloader, will add a couple of cycles of interrupt delay, and carries the risk of breaking the user program in a few rare cases. Removing this burden allows for a drastic reduction in code size and improved robustness.

V-USB uses a pin change interrupt on D+ to detect incoming USB transmissions. The interrupt routine will receive, decode and acknowledge USB data packets and store them in the RX buffer for parsing by the main program. In case outgoing data is requested from the host, the interrupt routine will respond accordingly if data is found in the TX buffer. The packet parsing and construction of outgoing packets is done in the main program by periodically calling usbpoll().

The idea of a polled or interrupt free V-USB was brought up by blargg in a posting on the V-USB forum. He also devised a pretty clever way to patch this modification into the existing V-USB code. His key insight was, that you can still use the interrupt system of the ATtiny when interrupts are disabled, by manually polling the interrupt flag register. The interrupt flag is set when the interrupt condition is met and will stay until it is manually cleared by the user.

The following code snippet actively waits for the interrupt flag and then calls the normal interrupt handler to process incoming data. The only modification to V-USB is to disable interrupts (CLI) and to replace the RETI instruction at the end of the interrupt routine in asmcommon.inc with a RET.

     do {                       
        if (USB_INTR_PENDING & (1<<USB_INTR_PENDING_BIT)) {
          USB_INTR_VECTOR();  // clears INT_PENDING (See se0: in asmcommon.inc)
          break;
        } while(1)

Well, that looks pretty easy, and to our amazement it even worked on some computers, sometime. The problem is that usbpoll() has to be called at some point, because otherwise the incoming USB transmissions are never parsed and nothing can be done with the data. A single call to usbpoll() takes about 45-90 µs on a 16.5 MHz ATtiny. Since we do not poll for interrupts during the function call, no incoming data can be received. A first approach to solve this was to define a timeout period and only call usbpoll() when no incoming data was detected for a certain amount of time. This improved the functionality to a point where it was possible to upload and run programs with micronucleus. But again, it completely failed on some computers and was pretty unreliable in general. It became clear, that a more sophisticated algorithm was necessary to decide when to block the CPU and when it should be avoided.

I was already about to give up on the interrupt-free approach. But then I noticed that the new 1.1.18beta release of the Saleae logic analyzer software came with a USB1.1 protocol interpreter. This finally provided a tool to understand what was going on.

detail_int

To understand the general issue, I first looked at the USB traffic of the standard interrupt-based V-USB. Apart from the USB-Traffic, I also logged two GPIO that I configured to show status information. Channel 2 is high when the interrupt is active, Channel 3 is high when the controller actively sends data. The plot above shows the transmission of a SETUP-packet that requests data from the client (the AVR controller with V-USB, called function in USB-lingo). As you can see, the interrupt is correctly called at the beginning of the transmission and the transmission is acknowledged with an ACK packet by the function. Since the host wants to receive data from the function, it sends an IN packet only a few microseconds after the set up packet. As you can see, V-USB only steps out of the interrupt for a very brief time, not enough to process the incoming data and prepare outgoing data. However, the USB 1.1 specification requires the function to respond to the request within 7.5 bit-times or 5 µs. Otherwise the host will time out. V-USB handles this situation by sending out a NACK, signalling to the host that it is not ready yet.

all_int

The problem is of course, that sending the NACK consumes all the CPU time, so the situation does not really improve. As you can see in the figure above, this can go on for quite a while. The host only stops sending IN packets right before the end of the frame, to avoid colliding with the 1 ms keep alive pulse. This finally leaves enough time for V-USB to process the data and prepare the TX buffer. This is extremely wasteful, since less than 10% of the USB bus traffic is actively transferring data and almost 90% of the CPU time is spent on “appeasing” the host. In the worst case, only one valid transmission can be processed per frame (1 ms). Since a low-speed USB data packet carries a maximum of 8 bytes, this limits the theoretical throughput to 8000 bytes/s and much less in practice due to additional protocol overhead. Although extremely ugly, it does work for V-USB.

Things get much more complicated with the interrupt-less version. How do we know that it is safe to call usbpoll() in this situation? As mentioned above, a partial solution is to process data only when the bus has been idle for 20-30 µs.

resync

The figure above shows bus traffic from an interrupt free version, using idle bus detection. Channel 2 is high when the USB transceiver routine is active (the former interrupt handler). Channel 3 is high when usbpoll() is active. The log starts at the end of a successfully received packet. After the bus has been idle for around 20 µs, usbpoll() is called. Unfortunately, the gap only lasted for 42 µs and usbpoll() was not able to finish before the next packet arrives. This leads to the transceiver trying to sync to the packet. Luckily it exits since it can not detect the sync pattern that is supposed to be at the beginning of the packet. The host times out and resends the packet for a second time. The transceiver successfully manages to resynchronize to the next packet.

resynch_failed

So far so good, but things do not always go well. The log above shows a similar situation, however in this case the transceiver routine is not able to resynchronise to the host transmission. The host sends the packet a total of three times and quits with an error after the third time without an ACK. Bummer!

It does however point into the right direction: The critical point is to get the re-synchronisation right. If it is possible to achieve this, then it is possible to miss one or two data packets without any harm.

It is relatively easy to detect if a data packet was missed, because in that case the interrupt flag is set after usbpoll() was executed. But how to detect if we reached the end of a packet? My first attempt was to wait for the next SE0 condition, which signals the end of a packet. Unfortunately that fails, because some transmissions come in multiple packets (e.g. OUT or SETUP are followed by DATA). In that case the transceiver would wrongfully acknowledge the data part without correctly processing it.

A much better solution was to wait for an idle bus. Again, the timing here is very critical. It turned out that a good compromise was to wait for 10-10.5µs, since this is the time it would have taken the function to send an ACK response, had it interpreted the last packet correctly.

forced_resync

Above you can see the bus traffic for an interrupt-free V-USB according to the idea above. Instead of waiting for free time on the bus, it simply calls usbpoll() after every successfully received packet. If a collision was detected it will wait for an idle bus to resynch. The bus can be stalled by up to 90 µs using this trick, since the minimum packet length is 45 µs and up to two packets may time out.

A very interesting side effect of this hack is, that the transmission is much faster than with the interrupt-based V-USB, because each data packet only needs to be resent 1-2 times instead of >10 times when the NACKing is used as above.

This is the inner loop of an interrupt-free V-USB implementation:

do {
       // Wait for data packet and call tranceiver                       
        do {
          if (USB_INTR_PENDING & (1<<USB_INTR_PENDING_BIT)) {
          USB_INTR_VECTOR();  // clears INT_PENDING (See se0: in asmcommon.inc)
          break;
        } while(1);
       // Parse data packet and construct response
       usbpoll();

       // Check if a data packet was missed. If yes, wait for idle bus.
       if (USB_INTR_PENDING & (1<<USB_INTR_PENDING_BIT))  
       {        
          uint8_t ctr;

          // loop takes 5 cycles
          asm volatile(      
          "         ldi  %0,%1 \n\t"        
          "loop%=:  sbic %2,%3  \n\t"        
          "         ldi  %0,%1  \n\t"
          "         subi %0,1   \n\t"        
          "         brne loop%= \n\t"   
          : "=&d" (ctr)
          :  "M" ((uint8_t)(10.0f*(F_CPU/1.0e6f)/5.0f+0.5)), "I" (_SFR_IO_ADDR(USBIN)), "M" (USB_CFG_DPLUS_BIT)
          );       
         USB_INTR_PENDING = 1<
   } while(1);

Are we done yet? There are some minor other things that popped up:

  • Handling of a bus reset is not efficiently done in usbpoll() anymore, since it is only called upon received a packet. Instead the detection of a reset also had to be moved into the main polling loop.
  • SETUP and OUT are immediately followed by a DATA packet. V-USB has special code to handle this situation, however this detection sometimes failed when the gap between the packets was too long. This is not a problem with interrupts, because they can be “stacked”. In the interrupt-free case additional code had to be inserted before “handleSetupOrOut” in asmcommon.inc
  • Since packets are received and parsed in order, it is not necessary anymore to have a double-buffered RX-buffer. Removing it saves some memory.

You can find the full implementation in the testing branch of Micronucleus V2 right now. But be aware that this is an actively developed version, so things may change. So far, this implementation has been tested by multiple people and was found to be stable. Micronucleus V2 will be released once multiple-device support is done.

Edit: Nice, looks like this made it to Hackaday! In light of that I’d like to add that Micronucleus V2 is not yet ready for release. If you just want a nice, small, bootloader for the ATtiny85, I would suggest you try the current release,  Micronucleus V1.11.

33 thoughts on “Interrupt free V-USB”

    1. Although I am not familiar with your problem, the reason could indeed be similar. I have also observed that things depend on the operating system and usb hub as well.

      You can easily emulate the behavior of the interrupt-free V-USB and avoid the NACKing problem by disabling interrupts while calling usbpoll() and waiting for an idle bus before enabling them.

    1. USB protocol analysis has been in since the 1.1.17a beta in August. I was doing some USB capture with 1.1.15 and had mentioned to Jonathan at Saleae that it would be nice to have USB protocol analysis. Later the same day I got an email from Mark Garrison with links to the 1.1.17a beta to try out. The folks at Saleae are great!

  1. Wow, what an amazing project! The ATiny target, is well.. tiny!

    Is there any unique peripheral/architecture or instruction set about the ATtiny, or could this be ported to other non-usb microcontrollers? (I’ve got an old Atmega128L that could be fun getting it do usb!)

    What do you think of other usb stacks, like LUFA? Also, I recall Microchip at one point was offering prizes for open source usb & tcp/ip stacks, not sure if that concluded or not, but perhaps V-USB can win!

      1. Sorry, thanks for reminding.

        Of course this could also be used for any other application. But why do you think it would be beneficial?

  2. hi.

    I couldn’t find the place for changin -RET- keyword in the file.
    asmcommon.inc

    The code snipet pasted is not the same as asmcommon.inc that I’ve readed.

    thanks.

    1. Well, there is much more that has to be changed elsewhere. You should read the rest of the article as well 😉

      You can use micronucleus as a guide. But generally, I’d like to mention that interrupt free V-USB is not for the faint-hearted. You need a really good grasp of USB to get it to work and probably some advanced degugging capabilities.

  3. cpldcpu, hoping you are still around. I have a project I have been working on for some time using v-usb. As you know using controller transfers with v-usb is not friendly. Too much time is spent in NAK but it is do-able. The biggest issue I have is where and when I need to use time sensitive code and have no choice but to disable the interrupts. The problem here is that any control transfer that comes in with int cleared will throw errors on the host since it does not get a NAK.

    So what I want to do is in my disable interrupts define, also tell the host to send NAK continuously. Using the v-usb disableallrequest still will not work because there are not interrupts firing.

    I was wondering, if I some how could switch to an interrupt free version to do this. I would not need any checking other if interrupt is set and just NAK everything. I would need to do RET but what function handles the NAK?

    1. Well, actually that would not be that complex. If you still can use the interrupts vectors, things get much easier than in micronucleus and you can leave V-USB intact:

      Simply disable interrupts for your timing critical code. Before enabling interrupts again, you check whether an interrupt was asserted. If yes, you run the resynchronization code*. After that you clear the interrupt flag and enable interrupts again. That’s it.

      *https://github.com/micronucleus/micronucleus/blob/master/firmware/main.c#L344

      1. Interesting. but what I see on my analyzer is this.

        ……………….CT [……]………………
        …Dis int……………………en int…
        …………..sensitive code…..

        The control transfer occurs in the middle of my interrupt downtime and the host application blows up. So weather I check for for the arrest, it is too late. Or at least it appears this way.

        Thoughts?

      2. Usually the PC attempts to retransfer three times. If you disable the intterrupt for a longer timer, you will get a timeout. Basically nothing you can do about this, unless you modify the host driver to avoid sending anything while the USB function is busy.

  4. Small follow to my last questions. I didn’t realize the simple answer was to;

    disable the usb interrupts.
    usbDisableAllRequests() //this prevents NAK
    RET instead of RETI remov the need to wait for the IRQ.

    The only question is how to reti based on an if
    brne EIMSK & ~(1 << INT0)//if not set branch
    RETI //was set
    RET //was not

    ? Did I get that right?

    1. Is looks like my last comment never posted. In the case it didn’t, so this does not read confusing. I wanted to use your trick to return NAK full time if I disable interrupts. What my issue is now, is if I disable the usb interrupt the host never gets a NAK, usbDisableAllRequests does not help because there is no interrupt set. I need to be able to do this without control transfers being left unanswered. I think this will work if timing is not an issue.

  5. Tom, I noticed the tread was locked. I do apologize if you meant for me to remain silent but wanted to at-least make sure I understood your approach. What you suggested does sound valid to me but I’m not able to implement it without the USB freezing.

    Give or take a few functions, this is what I gathered. The below freezes if that pending check is there.

    usbDisableAllRequests();// stop transfers
    DI();//stop ints
    doCode();//sensitive code
    usbPoll(); //issue at least one for the check below
    if (USB_INTR_PENDING & (1<<USB_INTR_PENDING_BIT)) // Usbpoll() collided with data packet
    {
    uint8_t ctr;

    // loop takes 5 cycles
    asm volatile(
    " ldi %0,%1 \n\t"
    "loop%=: sbis %2,%3 \n\t"
    " ldi %0,%1 \n\t"
    " subi %0,1 \n\t"
    " brne loop%= \n\t"
    : "=&d" (ctr)
    : "M" ((uint8_t)(8.8f*(F_CPU/1.0e6f)/5.0f+0.5)), "I" (_SFR_IO_ADDR(USBIN)), "M" (USB_CFG_DMINUS_BIT)
    );
    USB_INTR_PENDING = 1<<USB_INTR_PENDING_BIT;
    }
    usbEnableAllRequests();// start transfers
    EI();//start ints

    1. Sorry, no idea why the thread is locked. I cannot reply myself.

      Regarding the code. You don’t need:

      usbDisableAllRequests();// stop transfers

      You should call usbPoll before disabling interrupts, not after.

      It’s important to understand that that host controls all USB transmissions. The only option of influencing the protocol from client side is to not respond in a controlled manner.

      1. That does make sense. I was able to make it work but saw (what I think you alluded to above). I saw the host try 3 times (about 300 us) and then give up. I need to disable my interrupts for at least that long.

      2. Let me at least run this by you. I’d be rather upset if my assumption was wrong and this was avoidable. This is what I see, is this the host trying 3 times?

        the debug line shows when the interrupts are disabled.

  6. Hi Tim, wanted to ask a question I’m having issue understanding. Simple enough I woudl assume. So this v-usb sends a keep alive ever 1 ms, I think. I want to catch this. That is to say do a while loop while it does not fire and exit the minute it does. Everything I find waits for 8ms. I think I’m chasing the wrong ISR?

  7. Do you have a fork of interrupt free vusb I can play around with? There are some qmk vusb keyboards that support micronucleus and i have an annoying bug which drops keystrokes on my atmega328p based plaid clone.

    1. Micronucleus uses interreupt free V-USB.
      Actually the main issue is not really changing V-USB, but modifying your main code so it does not interfere with the USB transmissions,.

  8. Do you plan to add Attiny412 support, the new avr “series TinyAvr1” are slightly different than old avr, you must manually clear ISR flags, they use XMEGA type ports (but you can use VPORT to get legacy IO port access) and some opcodes are different cycles… I almost have a V-USB variant ported, but still working out some bugs. Seems nothing is out there for the new chips yet, they are fantastic because they all have internal 16MHz and 20MHz oscillators which can trim to 16.5MHz, so no XTL needed.
    cycles are below
    opcode avr tinyavr1
    push 2 1 DIFFERENT
    lds 2 3+ DIFFERENT
    cbi 2 1 DIFFERENT
    sbi 2 1 DIFFERENT
    ld 2 2,3 MAY LIKELY BE DIFFERENT
    ldd 2 2,3 MAY LIKELY BE DIFFERENT
    st 2 1,2 MAY LIKELY BE DIFFERENT
    std 2 1,2 MAY LIKELY BE DIFFERENT

Leave a reply to Martin Ayotte Cancel reply