I’ve been thinking about sample storage and what the path is from disk to the headphone jack. The RP2350 supports two 16 MB external memories on the QSPI interface. One of those will be the program flash, so the other can be a 16 MB RAM. That’s in addition to the 520 KB of SRAM on the chip. By far the simplest thing to do would be to load samples used in a track into the 16 MB RAM from the sample library on the non-volatile disk (which is currently an SD card, but will hopefully be soldered-down NAND flash). Then everything is in RAM, it’s fast and there’s no filesystem access to deal with while playing. But then I was worried 16 MB might not be enough, so I looked at the possibility of streaming sample data from disk. With a FAT-formatted SD card and using the ubiquitous fatfs library, the results are not great. The latency of SD card reads is just not predictable enough to allow eight samples to be played simultaneously.
When I get some NAND flash set up I may look at this again, but I think it would add a lot of complexity. With NAND flash the throughput would be fine but you still need a filesystem. I think I would probably have to use an RTOS in order to do non-blocking filesystem access, which I’ve been trying to avoid.
A limit of 16 MB per track sounds about right anyway. It’s 3 minutes of sample data, I’m happy with that for now.
I’ve been looking at sample playback. The main issue here is changing the pitch of the sample without introducing (too many) artifacts. When we pitch a sample up, any frequencies that go over the Nyquist frequency will fold back and we will have aliasing. When we pitch a sample down, we also pitch down the image of the sample centred at the sample rate Fs, and some of it will enter the audible range. On top of this the choice of interpolation method, if it’s not perfect, will introduce some noise/distortion.
The (or a) perfect way to do this is described in this very nice paper, The Quest for the Perfect Resampler: https://ldesoras.fr/doc/articles/resampler-en.pdf
- use a windowed sinc interpolator
- oversample the input 2x and downsample at the end: this way we can trash the top half of the band when pitching up without worrying about aliasing
- to pitch up even further: by analogy to textures in 3D graphics, precompute a mipmap of the sample, with one level per octave. As each level is half the size of the previous one, the total mipmap size doesn’t exceed twice the original sample size.
- there is a way to deal with pitching down by using an extra pre-oversampled mipmap level. Not looked too much at this yet.
Will this be practical to implement on the RP2350? Quite possibly not, we shall see. It’s nice to know what the ideal solution is, and if that doesn’t work then compromises can be made. Bog-standard linear interpolation, maybe combined with oversampling, filtering, or mipmaps, might be fine.
some resources: