Skip to content

Rare glitches on WS2812B pixels using neopixel library on micro:bit V2 #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kevinjwalters opened this issue Mar 8, 2025 · 13 comments

Comments

@kevinjwalters
Copy link

kevinjwalters commented Mar 8, 2025

I caught an (impressive) glitch in a recent video of the 60 RGB pixels on a Kitronik ZIP Halo HD at 04:25. There are two video frames at 25fps of it.

I see these perhaps every 10-15 minutes, the code is updating the ring all the time at about 5 to 25 Hz with show() method. This might be happening perhaps 1 in 10000 updates. I don't think there's anything particularly exotic about the code. It uses the neopixel library and i2c comms to an external MCP9740N. This is on a V2.2.1 board running MicroPython v1.18 on 2023-10-30; micro:bit v2.1.2 with nRF52833.

My best rather uninformed guesses so far:

The documentation mentions:

From our tests, the Microbit NeoPixel module can drive up to around 256 NeoPixels. Anything above that and you may experience weird bugs and issues.

This board has 60 so is well under that limit. Why is there an issue above 256?

(@DaveAtKitronik @AlasdairAtKitronik @JackAtKitronik might be interested in this.)

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 10, 2025

Worth mentioning I've tried some longer runs now and very similar code (I've reduced the amount of memory allocations a bit) if left for about 11 hours can give a 030 codal error (DEVICE_HEAP_ERROR) shown on the micro:bit display.

@martinwork
Copy link
Collaborator

Can you post simple example that triggers the error?

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 10, 2025

@martinwork I'll try making some cutdown versions to see what they do. I have a load of spare hardware now so I can do a few tests with and without i2c chatter in parallel to speed up finding a small reproduction.

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 12, 2025

Some updates

  • I have more ZIP Halo HDs now which happen to be V1.1 so I can rule out just one board being broken and the ZHH V1.0 hardware misbehaving.
  • I have some short code without use of the MCP7940 (i.e no i2c) which reproduces the LED bug. It can take 5-60 mins roughly for it to happen. On one device left overnight the code also froze in some way.
  • The real V1.1 application (version number should not be confused with Kitronik board versions) is more robust than I thought, for overnight runs I've had an 030, a freeze and two that have been going for about 36 hours.

Both bits of software have the LEDs at relatively low brightness levels which probably explains why any corruption makes them brighter and appear to flash. I sometimes notice the flash if it's on my desk, it's easier to observe in a fairly dark room. It's possible it can be detected with display.read_light_level() and placing a white piece of paper or mirror nearby. I might give that a go and try and print out the neopixel array data.

For background on what I'm up to, see Instructables: Making an Accurate Stylish Clock With the Kitronik ZIP Halo HD and a BBC Micro:bit in MicroPython.

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 12, 2025

I've had one device out of three after a day and a half give an 030 error running the short code that reproduces the LED glitches.

I've also confirmed that a micro:bit V2 on its own will (as expected) produce 030 errors eventually, just had one after about 5 hours of running neopixel-thirty-bug-repro-7.py. That could take 2 days if you're (un)lucky.

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 13, 2025

I wrote some code (neopixel-thirty-bug-detector-8.py) to try and detect the bright flash from the change in previously low RGB values that's common with these glitches. It uses display.read_light_level() after the show() with the micro:bit and ZIP HALO HD placed facing a white piece of paper about 3-4cm away. It doesn't work all the time but it just caught one which I noticed out of the corner of my eye. The odd thing is the ZIP LEDs don't get left in the corrupted state, they look like the normal pattern that's displayed. I'm not sure what to think of that. Is there anything that could mess up a neopixel show() and then re-execute it under the covers?

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 23, 2025

I tried this (from #83 (comment)) and assuming I got the right details for pin8 and the code works it did not fix the glitching.

def set_high_drive(port_id, pin_id):
    addr = 0x5000_0000 + port_id * 0x300 + 0x700 + pin_id * 4
    machine.mem32[addr] = (machine.mem32[addr] & 0xffff_f8ff) | 3 << 8
    print("DRIVE STRENGTH HIGH", port_id, pin_id)

set_high_drive(0, 10)  ### pin8 is P0.10/NFC2

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 29, 2025

I tweaked my flash detection program to set some others pins to make it easier to get some logic captures. I only have a humble Ikalogic SQ25 which doesn't have enough local memory to be able to capture many neopixel show() data bursts and decode them. I dropped it to 1Msps (obviously too slow to decode 800kbps data) and caught 4 shows() before the CH3 (red) Flash Detector triggered the capture end.

Image

I can measure them from the approximate start to finish and from my first capture like this it appears that the penultimate one is short and perhaps has a few prolonged high periods in the middle which is a clear protocol bogosity.

Image

The approximate lengths of those from left to right are 1.78ms, 1.79ms, 1.43ms and 1.80ms.

This trace happens to matches the previous testing where the flash causes the program to stop but the LEDs aren't left in a glitched state.

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 29, 2025

I tightened up the loop a bit in the program by removing stuff and fortunately it still glitches. I managed to capture two ZIP LED updates at 10Msps showing a similar trace with the short transfer to the previous comment.

Image

I think the protocol decode could be slightly unreliable at 10Msps as its matching the timing but here you can see an obvious deviation from the protocol with the "big square wave" with period 20.2us. The 0x230707 values are ones the code would generate.

Image

@kevinjwalters
Copy link
Author

Is https://github.com/lancaster-university/codal-nrf52/blob/master/source/neopixel.cpp the code that provides the functionality for the neopixel library on micro:bit V2? That has two implementations based off the compile time constant HARDWARE_NEOPIXEL, does anyone know which one is in use? I see the "non-hardware" one has a target_disable_irq and I'm not seeing any obvious direct management of interrupts in https://github.com/lancaster-university/codal-nrf52/blob/master/source/WS2812B.cpp or https://github.com/lancaster-university/codal-nrf52/blob/master/source/NRF52PWM.cpp

@kevinjwalters
Copy link
Author

kevinjwalters commented Mar 31, 2025

Based on the 1.45ms short output observed previously I've got a new detection program which looks for show() being too quick. I think it missed a few of the minor disruptions to ouput as I've noticed a few flashes visually that it didn't detect but it detects major protocol corruption if run for, say, 5-200 minutes. It doesn't need anything other than a V2 microbit to reproduce and it puts pulses on pins to make it easier for a logic analyzer to record the fault: neopixel-thirty-bug-timedetector-11.py

Should I open an issue in https://github.com/lancaster-university/codal-nrf52 for this?

@microbit-carlos
Copy link
Contributor

Wow, thank you for the detailed analysis @kevinjwalters!

Should I open an issue in https://github.com/lancaster-university/codal-nrf52 for this?

We generally recommend all micro:bit CODAL issues to go to https://github.com/lancaster-university/codal-microbit-v2, as that is only CODAL repository for micro:bit actively monitored. Creating an issue there with the summary of findings would be really appreciated!

Is https://github.com/lancaster-university/codal-nrf52/blob/master/source/neopixel.cpp the code that provides the functionality for the neopixel library on micro:bit V2? That has two implementations based off the compile time constant HARDWARE_NEOPIXEL, does anyone know which one is in use?

In the micro:bit V2 case HARDWARE_NEOPIXEL is enabled in the target/target-locked.json files: https://github.com/lancaster-university/codal-microbit-v2/blob/d159aba47e1797ace83ffbac81009909326f3c4f/target-locked.json#L32

I see the "non-hardware" one has a target_disable_irq and I'm not seeing any obvious direct management of interrupts

I haven't looked in detail at this, but the assumption was that the "hardware" version generates the 1-wire signal via the PWM hardware peripheral in the micro:bit V2 (V1 didn't have one) and it shouldn't be affected by interrupts, unless they somehow touched the same PWM peripheral.

@kevinjwalters
Copy link
Author

@microbit-carlos @martinwork I put some fresh traces in a new ticket in the appropriate codal repo lancaster-university/codal-microbit-v2#475 with a summary of what tested before hand. I now suspect the number of pixels has an effect on the errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants