I'm working on audio AI, both academic research and as founder of Spooky Labs. We don't have a webpage yet, but we do have clients. We are using deep learning to create rich new synthesizers that sound like they were designed by aliens, as well as novel vocal manipulation techniques.
We've spoken to musicians and producers who are excited about new tools, new sounds, and assistants that automate boring parts of the workflow.
But when the problem is framed as "music composition", it just leaves me scratching my head. Like, who's clamoring for that? I'm unaware in the history of music any automatically generated music that isn't seen as an oddity. Even if techniques improve, it's not a really sexy sell. People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand.
I understand that you can move the goalposts and say: "This isn't about total AI composition, it's about co-composition!" But honestly, I think it's just framing the problem wrong to talk about composition, and its lead to some really strange solution-in-search-of-a-problem research agendas. People should thinking about it through the lens of: How do you use AI to create tools that musicians want?
Other than for research purposes (extending the capabilities of AI) — I 100% agree with your sentiment regarding the relative worthlessness of AI composition.
As an aside, it is my understanding that there is no measure of something like consonance/dissonance in complex musical forms. There are models of dissonance in 2 and 3 note chords but even these have gaps. I'm suggesting that the research on "what sounds good to people" is surprisingly immature in contemporary science. This is surprising because that question played a major role in the history of science. For instance, many of the very first experiments conducted at the Royal Society (c1660s) investigated harmony — and arguably the first scientific experiment was designed to evaluate a mathematical model of music (Viz, the 5th century BC Pythagoreans demonstrating their integer ratio theory of harmony by casting bronze chimes at those ratios).
> You won't get a grasp of either by throwing a corpus into a bucket and fishing things out of it with statistics.
We can agree to disagree on that. Or, in any case, it's an empirical question. If we had a large corpus of music labeled by annotated "feels", I think we'd learn an immense amount about how music evokes feelings. (I'm not sure I feel comfortable with the term "semantics" applied to music)
Regarding the measures of dissonance, I'm only familiar with things like measures of roughness and harmonic entropy. If you know more, I'd appreciate sharing.
We don't have a large corpus of music labelled "feels" because "feels" and "evocations" are not atomic objects.
It's not obvious they're objects at all.
Debussy's La Mer is an excellent evocation of the sea, but you're not going to learn anything useful by throwing it into a bucket with a sea shanty. Or with Britten's Four Sea Interludes.
The absolute best you'll get from this approach is a list of dismally on-the-nose reified cliches - like the music editor who dubs on some accordion music when a thriller has a scene in Paris.
It's also why concepts like harmonic entropy don't really help. You can't parse "dissonance" scientifically in that way, because the measure isn't the amount of dissonance in a chord on some arbitrary scale, even if that measure happens to have multiple dimensions.
It's how the dissonance is used in the context in which it appears. There are certainly loose mappings to information density - too little is bad, too much is also bad - but it's not a very well explored area, and composers work inside it intuitively.
So there is no fitness/winning function you can train your dataset on. Superficial similarity to a corpus is exactly that, and misses the point of what composition is.
Music has invariant temporal forms that reliably communicate feelings, based on the context. A musical cadence versus none, a change in rhythm, lingering on a note... The common nature of these forms lends themselves to common feelings about them. When two people are open to music, with a similar experience, they roughly feel the same thing. Perhaps not exactly, but music as a technology for exchanging non-verbal experiences, feels, is surprisingly consistent — why does film music have such a common effect on the emotional vibe of the scene?
In the future, if we could gather and annotate people's feelings in response to musical forms (including but far more than consonance and dissonance), I'm sure this would enable an AI-based model of the emotional resonances of various musical elements (and their multi-level representations in the neural network). Then, compositional models could be trained using real-time aesthetic rating devices (e.g., reporting on pleasure/uncomfortabble and interestingness/boringness).
Now, this system would hypothetically be able to manipulate emotions, at least to the extent that a composer can now.
Is that useful in anything but a creepy way? Well, maybe you could add filters to existing compositions to change their vibe... like, the "humanize" button in logic pro gives a looser feel. You might be able to apply filters that could make a song feel more longing or hopeful.
I think apart from the layman's resistance to the idea of nonhumans being creative, the other problem is that the basic theories of composition musicians use are pretty well understood and based on applying some surprisingly simple maths to some instruments whose timbres are very complex (and "what works" is a moving target bound to cultural familiarity with instruments, intonation, musical phrasing and comfort with dissonance; not an AlphaGoZero situation where the ML process can expose the wealth of existing human theory is objectively inferior at achieving a win condition.)
Sure, a Deep Learning process with a suitably curated dataset can rediscover the basic principles of Western music theory... but why? Humans understood these things when creating the music that makes up the corpus, and non-AI software can already turn those patterns into musical structures which can be played by accurate synthesis, which only lacks a bit of nuance.
The human musician looking for AI-powered shortcuts doesn't want a lot of computation thrown at learning how to approximate a 12 bar blues or arpeggiate chord tones, they want it thrown at a post-production process (or better still, a pedal) that makes their mechanical playing of simple licks approximate BB King or an autotune that makes their average voice in a lower pitch sound like Whitney Houston.
>I think apart from the layman's resistance to the idea of nonhumans being creative,
What I think is funny is the resistance to the idea that a lot of what people do isn't all that special.
It all makes me think of the evolution of supermarket cashiers. It used to be a fairly skilled profession, now people are just used because of their abilities in object manipulation.
I disagree that there isn't already generative music that is enjoyed just for its aesthetic qualities. I'm a huge fan of Brian Eno's work, which is highly generative (and he's even published apps that will simply generate ambient music for you 24/7).
"Since I have always preferred making plans to executing them, I have gravitated towards situations and systems that, once set into operation, could create music with little or no intervention on my part. That is to say, I tend towards the roles of planner and programmer, and then become an audience to the results" - Brian Eno
I think generative music is particularly effective as ambient background or as part of a larger art installation. Also, if you're working on developing new synthesizers I'd highly recommend you get to know Eno's work.
As a musician, I'd love automated composition of human level music. Especially if you could navigate and blend and transform the space of styles, artists, and instruments. Co-composition has a place.
The reason such tools have historically been viewed as oddities is because they were and are objectively bad. 3 year old prodigies can exceed even the best output of software. In recent years, the models have improved, but the problem space is huge and the algorithms aren't human level yet.
When the tools reach human performance levels, people won't care if it's human or software. They'll only care if it's enjoyable.
All creative endeavors are at risk of being automated by ai in the next few decades. Software will exceed the best of human creativity, and that's a good thing.
No, it isn't. It'll make people stop making music. Even if all you care about is one particular way to evaluate the end product, it means there will be no new inputs, resulting in the death of creativity. And once that's been reached, it's almost impossible to revive.
Agreed. Moreover, if a genuinely sentient AI or MI as Minsky called it, comes to pass then it would then be fascinating to listen to it if it cared to compose music; in its absence musical composition is always the result of someone's human experience of being in this universe - necessarily as a human organism. And this is true even if the music is a research exploration of evocative (to a human of course) sound patterns.
> if only because the physical act of playing an instrument can never be replaced by software.
As an instrumentalist, I would love to believe this but more and more this does not appear to be the case.
I have talked to many people of different ages who really don't see any difference between someone actually playing an instrument in front of you and listening to pre-recorded music.
It's funny because it's in the context of me liking to play live music, and they never seem to realize they're saying, "What you do is meaningless."
I think it's that they have no personal experience with playing an instrument, combined with the fact that the dominant forms of pop music now don't have much if any place for instrumentalists.
Nobody is talking about playing music. And physical playing has been replaced by software quite a while ago. Not if you play it for your own fun, but almost nobody can distinguish a computer rendering from the real thing on a recording.
Computers can already generate convincing telephone calls and news articles. Yet people haven't stopped answering the phone or believing the news. /s
Actually, I have stopped answering the phone.
In my view there's already a sufficient quantity of perfect classical music, that for most people, they could listen endlessly, and it would always seem new to them. Especially if hearing some familiar material once in a while is part of the enjoyment.
Yet musicians and audiences got violently sick of perfect classical music, and started creating "bad" music, just for the experience of trying something new. And this has happened over and over in music and other art forms.
I won't say computers won't ever be able to do that, but it suggests a level of AI beyond simply curve fitting the existing sequences of notes in recorded or written tunes. At the same time, what music evolves into with the use of AI may open up new areas for exploration of new human music.
As a kid, I played music that my mom didn't recognize as music. Eventually the AI will be motivated to create music that humans don't like, but that the computers prefer. That will be the singularity. ;-)
Chess is a great analogy. Now, because of great computers, folks can choose to play against a “master” chess player whenever they like, whether they have an internet connection and a proper rank or not.
Being able to jam with an AI group of your favorite jazz bests seems like a great stand-in for when you can’t get a real trio or quartet together, and if sufficiently good at attending to the ideas of the “live player”, would probably “raise all boats,” making better composers overall.
I agree that what you propose would be better than nothing, but it will still be lacking.
Assuming the software can get there, we also need to add more sensors. The head nods, eye contact, subtle facial gestures, and other body language that are such an important part of a collaborative jazz ensemble will have to be sensed. However, even if the computer can be enhanced with sensors, now you've got to communicate from the computer(s) back to the other players. So, either advanced robotics, or ensembles adapt other signaling schemes that the computer player(s) can engage. It is not a simple problem even if AI can made to be "creative" and "musical," that is not the whole story.
I don’t think “equal or better than human” is the bar we’re trying to beat.
I was more responding to both the claim that “AI will kill human composition” (it won’t) and “talking about computer-based music composition is framing the question wrong, since it doesn’t provide value” (it does)
Just because it doesn't subscribe to the utopic "computers will create the best music ever" fantasy?
> computers hasn't killed chess
Because there's very little monetary value in seeing computers play chess. That's not the case in music: the entire infrastructure, from tools to conservatoires, comes from selling music. If that chain crumbles, music will be dead.
The infrastructure isn't about the music, though. It's about people, whether they are pop stars, ferociously good classical virtuosi, temperamental maestros, introverted but sympathetic innovators, or even just amateur bands/ensembles that enjoy the social aspects of making music.
That's where the big money is, not elevator music generated by computers.
The big money is where mass audiences are. Music's infrastructure is for a large part dependent on it. It encompasses everything from instrument manufacturers to piano teachers to concert organizers to masterclasses. If --as assumed the post I responded to-- software starts producing better music than people, or does so cheaper, or gets more clicks, etc., the infrastructure will disappear. It'll take a few decades, but that can kill off teaching and affordable instruments, taking everything except amateur music for the wealthy with it.
>> People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand. I understand that you can move the goalposts
With new tech, especially automation, moving goalposts isn't cheating. If successful, it almost always gets used in totally different ways than the human equivalent.
A common pattern is imagining automation through "robots," like in the jetsons. An automated hotel has a robot desk clerk, robot maid, etc. IRL, an automated hotel may have no check in, rooms designed to be self cleaning and guests must make their own bed. Maybe it's not even a hotel anymore, but a room or pod here and there. Robotic maids act as a stand in, for things that are hard to imagine.
>> AI to create tools that musicians want?
So... this is a good compass for advanced stages, "productizing." Describe your goals in terms of what it can do for users. For earlier, research oriented stages, the more abstract definition is good. Composition works for that:
Music composition (or generation) is the process of creating or writing a new piece of music, defined as a succession of pitches or rhythms, or both, in some definite patterns.
How that is ultimately used/sold/productized is certainly up for debate, but that definition sends you on a path that's different from "tools for musicians."
I made https://stevenwaterman.uk/musetree which is a custom frontend for OpenAI's MuseNet. I've used that to generate music that I use on my Twitch streams, games, and other things. I'm not kidding myself that it's as good as human-written music, nor is it truly ai-written music (as there's a lot of human selection in the process), but people frequently comment on the music and enjoy it, without realising it's AI-written
No one is clamoring for it because it isn't good enough yet, same reason no one is asking for AI designed houses, or business logos, or food. But that doesn't mean it can't get there eventually. It's also a very interesting scientific question, wrapped up with big questions about the nature of creativity and the capabilities of machines.
I remember Milli Vanilli (who I quite liked, to be honest), and a thousand manufactured bands since who at least make enough of an effort not to get caught out like that again. If that part of the industry (which seems like a large chunk) could get rid of the inconvenience of having to write/find good songs, I expect they would, and get back to exploiting pretty young things with not enough talent.
In short, those who value money ahead of music. On the other hand, if it improved the rubbish in the pop chart then I might embrace it!
As a music-theory-unsophisticate, I foresee a lot of potential for deep learning in creating musical arrangements for existing compositions using all kinds of available instruments. How might a Beethoven symphony be arranged for a typical 4 part rock band? What if you added a piano? What other instruments might add further value, esp. toward certain ends, like when learning to play an instrument? Could a deep net learn to accompany adaptively, as only a capable human can today? Could it offer instructional suggestions for improvement, especially subtleties in timing or dynamics?
I also see value in transferring musical style for a piece from one genre to another (let's say, from jazz to rock & roll). Or from one set of instruments (or voices) to others. If done well, that could provide a lot of potentially entertaining interpretations or mashups, or at worst some novel medleys. Sinatra raps. Nat King Cole does bluegrass.
Because examples of uses like these aren't likely to be plentiful, I suspect these won't be so much supervised learning tasks as reinforcement learning (RL) tasks. The trick then will be to somehow construct useful (and insightful) reward functions for each new use -- basically aesthetic meta-learning for RL.
If RL reward functions can learn to write winning advertising jingles, it's hard to imagine what creative doors they won't open, eventually.
> People simply want to listen to music created by people
I don't think people care who composed the music, I believe people want to see music played and sang by people. Going to a concert where you're just looking at a computer generating sound won't inspire the croud, but if the band plays something composed by an AI, who cares? Even nowadays music has multiple collaborators on a composition, and I almost never pay attention to who they all are.
Well I agree the term "music composition" is not marketable, I disagree on all other points. Actually I think the only thing pulling for AI is the idea that people would prefer to listen to something generated by a computer. People no longer view human sources as trustworthy, they no longer experience natural things as authentic. They want computer-certified facts, machine-performed actions.
You don't even need an AI to make the kind of music people like. You just need to dumb down the music--which is what has happened gradually since the gilded age. Now the charts are topped with monotone "robot" melodies and little to no chord structure, synths using standard basic wave forms, all rhythms quantized to a grid and all voices auto-tuned. You don't need an AI to generate this type of music by machine; ping pong balls in a tumbler would suffice.
The role of AI, then, is not that you need it to make the music, but you need an AI to claim authenticity. In the minds of today's people, the computer-made music must be better, must be smarter, than human-made music.
The band Yacht used AI to write an album called Chain Tripping that's really good IMO. They transcribed their old music to MIDI, fed it to an AI, had it generate short snippets of new MIDI, and then cut and paste those bits to get a score, which they performed.
I suspect the drums and bass are human-written, as they seem more coherent and "whole song-aware" than the other parts, but I have no way to confirm that.
I believe one of the problems is that we don't know what the goalposts are. People experience music differently, ranging from barely tolerating it, to pursuing it as an academic study, to making massive sacrifices in order to be immersed in it. The conscious lifestyle choice of many full time musicians is bewildering to many techies. This suggests we don't know the "customer" or the "market" at all, or at the very least, that they are fragmented.
I can imagine a future where AI can come up with new stylistic concepts, maybe even "predicting" how some music genre will evolves. which is certainly something I'd be interested in seeing explored. Imagine an AI that was able to come up with something remotely close to rock music after being fed 1930s-1940s blues music. It'd be pretty cool to see what it came up with after being fed music from the 2000-2010s.
It might be centuries before AI is writing good novels, but music seems a much easier nut to crack. Hell I've seen people sit down at a drum machine and make something enjoyable for minutes in literally seconds. Music doesn't have to be complicated, let alone deeply meaningful, to sound good.
AI already makes cool art -- e.g. the Google Deep Dream videos.
One use I can see is assist software for someone churning out commercial music. Maybe a standalone music generator (composition plus synth) would be useful for content generators avoiding copyright.
As both copyright violation policing gets easier and heavier and programatically generated music becomes easier I have to wonder how those two things will collide.
Current techniques are pretty successful at improvisation or 'noodling', not quite what most folks would call "proper" composition but a good inspiration for it. The paper is not very comprehensive, there's plenty of interesting stuff that it doesn't mention.
I don't know, I'd pay for something that gave me music I loved, 100% of the time (or, hell, even 2% of the time). Beats sifting through the infinite trove of music just so I can stumble upon something I like by chance. My success rate there is more like 0.1%.
We've spoken to musicians and producers who are excited about new tools, new sounds, and assistants that automate boring parts of the workflow.
But when the problem is framed as "music composition", it just leaves me scratching my head. Like, who's clamoring for that? I'm unaware in the history of music any automatically generated music that isn't seen as an oddity. Even if techniques improve, it's not a really sexy sell. People simply want to listen to music created by people, even if AI music were perfect. Only in commercial applications like stock music or jingles is AI composition in demand.
I understand that you can move the goalposts and say: "This isn't about total AI composition, it's about co-composition!" But honestly, I think it's just framing the problem wrong to talk about composition, and its lead to some really strange solution-in-search-of-a-problem research agendas. People should thinking about it through the lens of: How do you use AI to create tools that musicians want?