Baroque rock – Google’s new AI turns text into music

Lightwreather · January 30, 2023

Summary

In a vein similar to Dall-E, Google has now demoed a project that takes a text prompt and uses it to generate a piece of music that can be several minutes long. Suffice to say, the results are mixed in my opinion. The model used is called MusicLM, and you can’t play around with it for yourself. However, some examples of the results have been put up.

Google has also released a research paper explaining how they achieved these results. I honestly, can't read past the first paragraph but if someone's interested, I will link to it in the sources.

Quotes

Quote

The examples are impressive. There are 30-second snippets of what sound like actual songs created from paragraph-long descriptions that prescribe a genre, vibe, and even specific instruments, as well as five-minute-long pieces generated from one or two words like “melodic techno.” Perhaps my favorite is a demo of “story mode,” where the model is basically given a script to morph between prompts. For example, this prompt:

Quote

electronic song played in a videogame (0:00-0:15)

meditation song played next to a river (0:15-0:30)

fire (0:30-0:45)

fireworks (0:45-0:60)

Resulted in the audio you can listen to here.

Also featured on the demo site are examples of what the model produces when asked to generate 10-second clips of instruments like the cello or maracas (the later example is one where the system does a relatively poor job), eight-second clips of a certain genre, music that would fit a prison escape, and even what a beginner piano player would sound like versus an advanced one. It also includes interpretations of phrases like “futuristic club” and “accordion death metal.”

MusicLM can even simulate human vocals, and while it seems to get the tone and overall sound of voices right, there’s a quality to them that’s definitely off. The best way I can describe it is that they sound grainy or staticky.

Quote

Like with other forays into this type of AI, Google is being significantly more cautious with MusicLM than some of its peers may be with similar tech. “We have no plans to release models at this point,” concludes the paper, citing risks of “potential misappropriation of creative content” (read: plagiarism) and potential cultural appropriation or misrepresentation.

It’s always possible the tech could show up in one of Google’s fun musical experiments at some point, but for now, the only people who will be able to make use of the research are other people building musical AI systems. Google says it’s publicly releasing a dataset with around 5,500 music-text pairs, which could help when training and evaluating other musical AIs.

My thoughts

After going through some of the examples of the page, I'm gonna have to say that it's absolutely not perfect. I does somethings well and others not so much. This might have worked better if it generated MIDI files or Sheet music rather than complete audio files, but as it stands the production value on these are atrocious, particularly non-electric instruments.

It is likely that we won't see much improvement in this, seeing as we aren't going to get the model or a way to interact with it and that it is likely that Google just doesn't see much profit in it. I can't actually see a viable use case for them, but if anyone can think of one, feel free to quote this line and tell me.

Furthermore, this isn't the first, nor will it likely be the last time "AI" or some other form of automated composition stuff has been made but I just feel ambivalent towards them, even as a composer and producer.

Well, anyway. Back to your regularly scheduled *insert here*

Sources

TheVerge

Google Research Page [Page including examples]

Research Paper [arxiv.org]

Kisai · January 30, 2023

Remember: All ML stuff is closer to auto-complete than any creative effort. Any time you see an "impressive AI demo", consider how cherry-picked it is.

There have been several Music ML projects before, and generally they're okay at 'figuring out' how to fill in, or "complete" a song given some input. But realistically, music ML projects need to generate MIDI, so that it can be cleaned up. So I don't see any practical use of this project, at least not one with commercial value.

In terms of practically, I could see it being useful to generate audio for marketing internal stuff. You know how all those "corporate elevator/boardroom music" is almost indistinguishable from each other and "on hold" music. But you definitely are not using this to generate music commercially. At best maybe you might get some good loops out of it that you could use for video game background music.

However, just like "Dall-E", many people are going to look at it and go "oh this is gonna put me out of a job" when that is extremely unlikely. It's not creative. Unlike visual artwork, where you can readily see the mistakes, music requires some level of understanding "how" music should sound, or it will just be noise. So in terms of practical use, just like Dall-E it might have more use in prototyping/placeholder assets that are intended to be replaced.

Lastly, prepare to see more audio "ML" projects as many more can actually be done on a GPU someone might actually have.

I'll note that the samples do sound cherry-picked and it also seems like they may have been trained on 22khz sources, because it has that same audio-fidelity of neuro-TTS projects.

Mark Kaine · January 31, 2023

4 hours ago, Kisai said:

Remember: All ML stuff is closer to auto-complete than any creative effort. Any time you see an "impressive AI demo", consider how cherry-picked it is.

i said it in another "AI" topic but it fits here even better... a lot of these Konami DX, DDR what have you songs were "completely" AI generated ,as in sure there was a "composer" that feed it samples, algorithms and stuff but how the songs actually played out was completely "computer generated" but not really random... and while most songs are short (around 2 minutes) some of them are amongst the best ever made - and im almost sure other videogame music was done in a similar fashion.

Unfortunately I don't really know a lot more about this, but then again im not really interested in how it was made, its the results that count.

Just saying, yeah, this isnt anything new, other than it has now the google / AI advertising stamp... and its probably rather disappointing just like most other "AI" stuff.

4 hours ago, Kisai said:

However, just like "Dall-E", many people are going to look at it and go "oh this is gonna put me out of a job" when that is extremely unlikely.

well tbh, i find that likely, a lot of music nowadays is just extremely unimportant and uninspired, if an "AI" can make the same cheaper you can bet some people *will* lose their jobs...

Salv8 (sam) · January 31, 2023

this is literally the plot of that anime Carol & Tuesday.

it's true what they say, life imitates art.

24 minutes ago, Mark Kaine said:

if an "AI" can make the same cheaper you can bet some people *will* lose their jobs...

the 90's home town singer and countless 'rappers' selling their mixtapes on CD (despite it being 2023) in LA will cease to exist.

i don't think anyone will care about the 'rappers' but the home town singer will be a sore spot for Americans since i think it's a common American staple/dream that may no longer be possible.

Kisai · January 31, 2023

31 minutes ago, Mark Kaine said:

i said it in another "AI" topic but it fits here even better... a lot of these Konami DX, DDR what have you songs were "completely" AI generated ,as in sure there was a "composer" that feed it samples, algorithms and stuff but how the songs actually played out was completely "computer generated" but not really random...

Fractal music has been a thing for at least a decade. There's also like a hundred ML Music projects since 2017.

Basically, music isn't low-effort, but if we focus just on melody's and beats, an AI is more than capable of generating pleasant sounding music by looking at what people have rated or hated. However picking the right instrument, lyrics or even a singing voice is something that an AI will pretty much never be able to do. ChatGPT ain't gonna write a good song, let alone a chart topper.

Pretty much music requires a certain level of skill to know what sounds nice and appropriate. What is suitable for an acoustic guitar is not appropriate for a violin, and is even less appropriate for an 8-bit sawtooth sound. When you find 8-bit "mix" of a popular song, you've either found an actual 8-bit designed track, or you've found someone who used Spotify's Basic Pitch on the track and set it to 8-bit instruments. The latter never sounds particularly good, because it's done without much thought to how it should sound.

BlueChinchillaEatingDorito · January 31, 2023

That sample "relaxing jazz"... yea I'm not relaxed. It's just a mess of notes.

Edit: Mods help... can't remove the embed... Ctrl + A + Del does not get rid of it...

Coaxialgamer · January 31, 2023

Anyone else feel that these are all quite noisy? I'm listening to a few of them and there's an audible hiss-like sound playing over everything.

It's like the took the dreamy/LSD-induced warping found in some AI-generated image art and transposed that into sound

Shreyas1 · January 31, 2023

Ngl when we get AI generated music on the level of what stable diffusion can do for art I might prefer that to human made music

Also would be cool to see an AI that could extend/shorten pieces

xAcid9 · January 31, 2023

Can't wait for music awards to be fill with robots.

Kisai · January 31, 2023

4 hours ago, Coaxialgamer said:

Anyone else feel that these are all quite noisy? I'm listening to a few of them and there's an audible hiss-like sound playing over everything.

It's like the took the dreamy/LSD-induced warping found in some AI-generated image art and transposed that into sound

That's because it uses 24khz samples. CD's are 44.1khz.

Doobeedoo · January 31, 2023

It's better than most pop and rnb whatever crap.

mr moose · January 31, 2023

So it seems to be about late high school early certificate 2/3 quality stuff. I am sure with more refining it won't be long before majority of the music consuming population wouldn't be able to tell. What will be interesting to see is if AI will ever be able to master human quirks, the traits that stem from personal preference, hearing traits and emotional/situational influence. Those are the things that gives music a unique signature.

8 hours ago, Coaxialgamer said:

Anyone else feel that these are all quite noisy? I'm listening to a few of them and there's an audible hiss-like sound playing over everything.

It's like the took the dreamy/LSD-induced warping found in some AI-generated image art and transposed that into sound

I heard a bit of it, most likely just artifacts from compression etc.

Kisai · January 31, 2023

6 minutes ago, mr moose said:

So it seems to be about late high school early certificate 2/3 quality stuff. I am sure with more refining it won't be long before majority of the music consuming population wouldn't be able to tell. What will be interesting to see is if AI will ever be able to master human quirks, the traits that stem from personal preference, hearing traits and emotional/situational influence. Those are the things that gives music a unique signature.

I heard a bit of it, most likely just artifacts from compression etc.

The source audio are all youtube videos, and are sampled down to 24khz, that's why it sounds the way it does.

Even if they didn't resample it, the compression artifacts would still be in it. This is actually a typical thing that happens when the training data is lossy compressed (even with OPUS), is that the audio isn't fullband either. You hear it in pretty much ALL Neural TTS (eg AWS Polly, Google, Microsoft.) They are not trained on CD-quality audio, because these projects aren't intended to produce commercially usable audio. It's proof-of-concept more than it is viable.

In ML TTS, an 8GB GPU can do about 140 characters inference at 22khz. To do 48khz, you need 20GB for the same length, and you only get like at most 10 seconds of audio. So using a full A100 (80GB) you might get 30-40 seconds. I presume Music generation has similar issues, where either you increase the quality but lower the length, or get a longer length by cutting the sample rate of the training data in half.

Quote

MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes.

Important to point out though...

Quote

The closest to our approach among these works is DALL·E 2 (Ramesh et al., 2022). In particular, similarly to the way DALL·E 2 relies on CLIP (Radford et al., 2021) for text encoding, we also use a joint music-text embedding model for the same purpose. In contrast to DALL·E 2, which uses a diffusion model as a decoder, our decoder is based on AudioLM. Furthermore, we also omit the prior model mapping text embeddings to music embeddings, such that the AudioLM-based decoder can be trained on an audioonly dataset and the music embedding is simply replaced during inference by the text embedding.

...

whereas the tokenizers and the autoregressive models for the semantic and acoustic modeling stages are trained on a dataset containing five million audio clips, amounting to 280k hours of music at 24 kHz.

AudioLM can be found here https://google-research.github.io/seanet/audiolm/examples/

So it's comparison to DALL-E is pretty much accurate, at least on the training, the paper itself says so.

I'd hazard to guess, that if it has not been trained on something, it would literately be unable to produce anything not found in the MusicCaps. So learning to do something "new" would be impossible without retraining it.

mononymous · January 31, 2023

8 hours ago, Shreyas1 said:

Ngl when we get AI generated music on the level of what stable diffusion can do for art I might prefer that to human made music

Like this? As long as it doesn't involve vocals it shouldn't be too far off.

Watched/Listened one of their performances back in 2017 that pairs an AI and a DJ.

Mark Kaine · January 31, 2023

11 hours ago, BlueChinchillaEatingDorito said:

That sample "relaxing jazz"... yea I'm not relaxed. It's just a mess of notes.

That's literally what jazz is!

Mark Kaine · January 31, 2023

12 hours ago, Kisai said:

an AI is more than capable of generating pleasant sounding music by looking at what people have rated or hated. However picking the right instrument, lyrics or even a singing voice is something that an AI will pretty much never be able to do. ChatGPT ain't gonna write a good song, let alone a chart topper.

the thing is what im talking about isnt the same kind of "AI" more like specifically designed to make music in certain genres, like it didn't really pick instruments etc, it basically just made the "composition" and i bet most of it sounded crap, but when it didn't, boy did it sound good!

if i can find it I'll post some examples but the problem is i can't be too sure its really that thing, because im not too knowledgeable about this sort of thing, these Konami DX songs often sound similar, either computer generated or not.

12 hours ago, Kisai said:

Pretty much music requires a certain level of skill to know what sounds nice and appropriate. What is suitable for an acoustic guitar is not appropriate for a violin, and is even less appropriate for an 8-bit sawtooth sound.

yeah, pretty much like other "AI art" the results will just be soulless, boring and sort of outdated, because everything is just recycled, most likely

edit: yeah, i cant even find that version, this seems to be a remix, so not sure, but its the "style" that counts... its like Aphex Twin actually made something good for once! : D

mind you, i like "some" Aphex Twin stuff (like his very first "song" eg)

porina · January 31, 2023

12 hours ago, Mark Kaine said:

i said it in another "AI" topic but it fits here even better... a lot of these Konami DX, DDR what have you songs were "completely" AI generated ,as in sure there was a "composer" that feed it samples, algorithms and stuff but how the songs actually played out was completely "computer generated" but not really random... and while most songs are short (around 2 minutes) some of them are amongst the best ever made - and im almost sure other videogame music was done in a similar fashion.

Got a reference or specific examples where DDR music was "AI" generated? To me DDR was at its height around PS2 era, and I'm not even sure usable "AI" was even around then, at least not in any form we'd recognise today.

Commodus · January 31, 2023

37 minutes ago, Mark Kaine said:

edit: yeah, i cant even find that version, this seems to be a remix, so not sure, but its the "style" that counts... its like Aphex Twin actually made something good for once! : D

mind you, i like "some" Aphex Twin stuff (like his very first "song" eg)

Aphex Twin makes good music! It's just not always gentle on the ears. "Windowlicker" is a classic, especially for the video.

At any rate, I haven't heard of him using AI for music generation. You'd want to look to artists like Holly Herndon for that.

Kinda Bottlenecked · January 31, 2023

Great, I look forward to becoming a musician too

Lightwreather · January 31, 2023

45 minutes ago, Kinda Bottlenecked said:

Great, I look forward to becoming a musician too

Just an FYI, because this irrationally annoys me, this would make you a composer/producer, not a musician.

Kinda Bottlenecked · January 31, 2023

11 minutes ago, Lightwreather JfromN said:

Just an FYI, because this irrationally annoys me, this would make you a composer/producer, not a musician.

Don't worry, when the tech comes I'll be a composer/producer and a musician too

Mark Kaine · January 31, 2023

2 hours ago, Commodus said:

At any rate, I haven't heard of him using AI for music generation. You'd want to look to artists like Holly Herndon for that.

no, no, thats not what I meant, just that that beatmania song sounds like Aphex Twin... and i really wish I'd find the article i read about these "ai generated" beatmania songs, it's pretty cool!

and ok, thanks, I'll check that out.

As for Aphex Twin, as said i like some songs, most are kinda trying too hard though imo...

i have a limited album from him (different name iirc, 500 copies) that one is good (it has a "1000bpm song" or something lol)

ps: here, thats the other song i meant, Fresher and Cleaner, thats my kinda thing, good old pure acid, its not trying to do anything else...

BlueChinchillaEatingDorito · February 1, 2023

17 hours ago, Mark Kaine said:

That's literally what jazz is!

mr moose · February 1, 2023

20 hours ago, Mark Kaine said:

That's literally what jazz is!

And as much as I hate jazz, there is a growing body of evidence that says it's the best music to play to your growing baby if you want them to have good musical ears.

(something to do with discordance and lack of flow/repetition that causes exponential growth in the auditory processing parts of the brain).

18 hours ago, Lightwreather JfromN said:

Just an FYI, because this irrationally annoys me, this would make you a composer/producer, not a musician.

No, it would make him a coder or programmer. A composer or musician knows where to place each note or when to play each note to get the music to sound a specific way, telling an ML program to mimic 270K hours of samples is not the same. I can guarantee that not an ounce of music theory was used by the programmers beyond getting musicians to create the MusicCAPs, which is basically a dataset of captions that go with music so the AI can associate words to samples.

What irrationally (maybe not even irrationally) annoys me is that people who know nothing about music or the music industry yet are absolutely sure they know what is "Easy" and what isn't. The number of times I heard people say "electronic music isn't real music", I wish I had buck every time because I'd be fucking richy rich by now.

Sign In

Baroque rock – Google’s new AI turns text into music

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites