Recommended MCU w/ GPU for best LVGL performance?

What do you want to achieve?

I have a round 1.28" display (240x240) being driven from an ESP32-S3 and it works fine but I’m upgrading to a 1.43" AMOLED display which is about 2x the pixel (460x460). The ESP32-S3 is having a hard time rendering transitions smoothly between screens on the larger screen. I’m looking for an alternative for the ESP32-S3 that I can port my project over to in order to get nice smooth high frame-rate transitions and animations

What have you tried so far?

I purchased several dev boards to try them but figured I should ask around first before spending days porting the project to different platforms.

ESP32-P4
I got an ESP32-P4 board but it’s large and full of things I don’t need (Have not been able to find a small clean ESP32-P4 board). It runs at a much higher CPU and has a 2nd lower power CPU which would be nice for the base operations and just use the 400mhz CPU for LVGL. However, the ESP32-P4 doesn’t have any GPU or acceleration so it would still be 100% CPU-driven software rendering. Not sure if this would be enough.

STM32H7745Z
This is a dual-core 480mhz Cortex M7 + 240mhz Cortex M4, similar in concept to the ESP32-P4. Boards are widely available, small, and affordable. Also this MCU has ChromART DMA2 which claims to be a supported GPU in LVGL 8.x.x but from what I’ve been reading on the forums, it’s limited, has been dropped in LVGL 9.x.x, and according to Gabor, there was little to no performance improvements.

STM32H723Z
This one is a single core Cortex M7 at 550Mhz. It too has ChromART DMA2 but as mentioned above, doesn’t seem to be worth anything. I’m thinking maybe the higher clock speed could just brute-force the frame rate on the larger screen and not even bother with the DMA2. I did see discussion about NemaGFX support in LVGL 9 which is supposed to be what STM’s ChromART is based on. Also read about using NeoChrom with these MCUs as well. Unfortunately lots of discussions and debates but no real clear breakdown of what works best, differences, benchmark performances, etc so I’m confused.

**NPX **
For NXP there is PXP, VGLite, and G2D GPU support in LVGL but their documentation doesn’t explain anything about them, their supported functionality, performance differences, etc. Also, I’ve had a hell of a time finding dev boards for NXP i.MX RT MCUs. I already have a Teensy 4.1 which appears to be based on the i.MX RT1062.

#3 - Would an NXP board with one of these GPUs be a better option than DMA2 or NemaGFX in terms of rendering performance for smooth high frame-rate full screen transitions?

Questions

#1 - Does anyone know or have experience with STM32 and LVGL using either DMA2, NeoChrom, or the NemaGFX support in v9.x.x? How do they compare?

#2 - Does anyone know or have any experience with NXP and their various support GPU configurations?

#3 - Anyone use a Teensy 4.1 with a nice Squareline UI on a large display? How does it perform?

#4 - The AMOLED screen I’m using is QSPI, would that be enough throughput?

Your question isnt very well detailed. Most important is display WxH used and bus info. Too you mix GPU and DMA2D etc.

I did provide that info in the post. 460x460 round display using QSPI

Then basic match you require MCU with hw supported QSPI on 80MHz with DMA. Then pixel max refresh teoreticaly = 40M/460x460x3 = 63 FPS
Result mission not impossible make smooth full screen RGB888.
On RGB565 its better, but on edge. Next bottlenecks is RAM and MCU speed…

or this project, when doing 400ms fade transitions between screens on the 460x460, it was very slow FPS and actually looked like it was rendering from top of the screen down (like a rolling shutter scanning it’s way down). It looked really bad like the CPU couldn’t keep up with the rendering and I could see it trying to render from top down at a very very slow FPS. The setup was:

  • QSPI @ 80mhz
  • DMA enabled
  • Using double buffers
  • 1/8th screen buffer size
  • Using only SRAM for buffer, not PSRAM
  • 10ms refresh rate (lvconf.h)
  • Tried both built-in mem management & custom (lvconf.h)
  • 16bit color depth
  • No rotation
  • CPU at 240mhz

I tried using a larger screen buffer size but it wouldn’t work, likely due to ram overflow. Im not sure if it would have better results using slower PSRAM for buffers but doing larger 1/2 or even full screen buffers.

Alternatively, I am considering trying the project with an esp32-p4 (I have a dev board for testing) to see how it performs and also due to minimal code changes. The other option is doing a big port to STM32H723Z since it has a faster CPU (550mhz), faster SPI bus than the esp32-s3 (120mhz vs. 80mhz) and has ChromART/DMA2 support.

Im just worried about the work involved in porting the project over to use STM32 because I use a lot of the ESP32 libraries.

I mean ESP-S3 is ok for this , i use it with 360x360 QSPI without issue
try use right lib with real hw dma support and older stable lvgl. I work on 8.3.7 fyi