Having just done something similar, your options very much depend on whether BGAs are viable or you need to stick to more forgiving packages.
If you can use BGAs, look at TI-AM335x series as used on the beaglebone - there's extensive documentation and software because of the beaglebone and they're around $10-15 in quantity (Cortex-A8 though). They have a build in LCD controller and some models have what TI call PRU - a pair of RISC cores that might be able to replace your M0+.
If BGAs are prohibitive, look at the raspberry pi compute module - you will not be able to beat it on price or performance for low to medium quantities.
If neither of these suit bear in mind there's a large market for ARM modules sold as CoM or SoM; computer/system on module, these are quite expensive compared to the bare processor but typically include power management, RAM and a few peripherals and are the sensible option for quantities below and approaching 1k units because of the NRE savings.