Artificial intelligence research laboratory OpenAI today debuted a new generative model that’s able to make music called Jukebox. It’s technologically impressive, even if the results sound like mushy versions of songs that might feel familiar. According to the post on OpenAI’s blog, the researchers chose to work on music because it’s hard. And even if they’re not exactly what I’d call music, the results the researchers got were impressive; there are recognizable chords and melodies and words (sometimes).
The way OpenAI did it was also fascinating. They used raw audio to train the model — which spits out raw audio in return — instead of using “symbolic music,” like player pianos use, because symbolic music doesn’t include voices. To get their results, the researchers first used convolutional neural networks to encode and compress raw audio and then used what they call a transformer to generate new compressed audio that was then upsampled to turn it back into raw audio. Have a chart!
The approach is similar to how OpenAI developed a prior music-making AI called MuseNet, but Jukebox goes a step further by generating its own lyrics in collaboration (the company used the word “co-written”) with OpenAI researchers. Unlike MuseNet, which used MIDI data, these models were trained on a raw dataset of 1.2 million songs (600,000 in English) and used metadata and lyrics scraped from LyricWiki. (Artist and genre data were included to better the model’s output.) Even so, as the researchers write, there are limitations.
“While Jukebox represents a step forward in musical quality, coherence, length of audio sample, and ability to condition on artist, genre, and lyrics, there is a significant gap between these generations and human-created music,” they write. “For example, while the generated songs show local musical coherence, follow traditional chord patterns, and can even feature impressive solos, we do not hear familiar larger musical structures such as choruses that repeat.”
There are also other problems with the experiment. As the writer and podcaster Cherie Hu pointed out on Twitter, Jukebox is potentially a copyright disaster. (It’s worth noting that just this week, Jay-Z attempted to use copyright strikes to take down synthesized audio of himself from YouTube.)
Did Kanye West, Katy Perry, Lupe Fiasco and the estates of Aretha Franklin, Frank Sinatra and Elvis Presley give OpenAI permission to use their audio recordings as training material for a voice-synthesis/musical-composition/lyric-writing algorithm? My guess is no.
— Cherie Hu (@cheriehu42) April 30, 2020
All of that said, Jukebox is a pretty fascinating achievement that pushes the boundaries of what’s possible. Even if the musicians OpenAI showed Jukebox to thought it needed some work. Go listen for yourself!
Correction: An earlier version of this story’s headline implied that Jukebox generated lyrics alongside its music, which is incorrect. The model only generates music.