sample storage

27 March 2025

I’ve been thinking about sample storage and what the path is from disk to the headphone jack. The RP2350 supports two 16 MB external memories on the QSPI interface. One of those will be the program flash, so the other can be a 16 MB RAM. That’s in addition to the 520 KB of SRAM on the chip. By far the simplest thing to do would be to load samples used in a track into the 16 MB RAM from the sample library on the non-volatile disk (which is currently an SD card, but will hopefully be soldered-down NAND flash). Then everything is in RAM, it’s fast and there’s no filesystem access to deal with while playing. But then I was worried 16 MB might not be enough, so I looked at the possibility of streaming sample data from disk. With a FAT-formatted SD card and using the ubiquitous fatfs library, the results are not great. The latency of SD card reads is just not predictable enough to allow eight samples to be played simultaneously.

When I get some NAND flash set up I may look at this again, but I think it would add a lot of complexity. With NAND flash the throughput would be fine but you still need a filesystem. I think I would probably have to use an RTOS in order to do non-blocking filesystem access, which I’ve been trying to avoid.

A limit of 16 MB per track sounds about right anyway. It’s 3 minutes of sample data, I’m happy with that for now.

sample rate conversion

23 March 2025

I’ve been looking at sample playback. The main issue here is changing the pitch of the sample without introducing (too many) artifacts. When we pitch a sample up, any frequencies that go over the Nyquist frequency will fold back and we will have aliasing. When we pitch a sample down, we also pitch down the image of the sample centred at the sample rate Fs, and some of it will enter the audible range. On top of this the choice of interpolation method, if it’s not perfect, will introduce some noise/distortion.

The (or a) perfect way to do this is described in this very nice paper, The Quest for the Perfect Resampler: https://ldesoras.fr/doc/articles/resampler-en.pdf

use a windowed sinc interpolator
oversample the input 2x and downsample at the end: this way we can trash the top half of the band when pitching up without worrying about aliasing
to pitch up even further: by analogy to textures in 3D graphics, precompute a mipmap of the sample, with one level per octave. As each level is half the size of the previous one, the total mipmap size doesn’t exceed twice the original sample size.
there is a way to deal with pitching down by using an extra pre-oversampled mipmap level. Not looked too much at this yet.

Will this be practical to implement on the RP2350? Quite possibly not, we shall see. It’s nice to know what the ideal solution is, and if that doesn’t work then compromises can be made. Bog-standard linear interpolation, maybe combined with oversampling, filtering, or mipmaps, might be fine.

some resources:

https://jeskola.net/xs1/content/test/ tests of various samplers using a 15khz sine wave. you can see which ones use linear interpolation and which use sinc, and which are able to anti-alias
https://www.discodsp.com/bliss/aliasing/ more tests
https://yehar.com/blog/?p=197 Polynomial Interpolators for High-Quality Resampling of Oversampled Audio
https://www.musicdsp.org/en/latest/Other/93-hermite-interpollation.html hermite interpolation. a bit better than linear
https://www.dsprelated.com/freebooks/pasp/Windowed_Sinc_Interpolation.html windowed sinc interpolation