Allegro

“It all sounds the same”

AI & the bland prospect of 'vanilla' music

Volume 124, No. 6June, 2024

Dr. Hamid R. Ekbia

 

Official notice regarding the formation of Local 802’s AI Committee

In recognition of the potential existential threat unregulated self-generative AI poses to the livelihoods of ALL professional musicians, Local 802 has formed an AI Committee. The committee’s goals involve (but are not limited to) the creation and accumulation of informative and educational data for the enlightenment of the membership. We will also be developing and advising 802 leadership on strategies for protecting our rights and careers from this rapidly developing threat. At this early stage of our efforts to create an AI committee, Local 802 has chosen William (Bill) Meade as the interim chairperson of this committee. Membership in the AI committee is open to ALL Local 802 members in good standing. We especially encourage those members with expertise in AI to participate. Once a stable participating membership emerges, we will implement standard democratic procedures for nominating and electing the leadership and structure of the Local 802 AI committee. To join, learn about, or contribute to the committee, we ask that you contact Local 802 Chief of Staff Dan Point. You are also encouraged to check out Local 802’s AI Resource Page.


This month’s column in our AI series is by Dr. Hamid Ekbia, a professor at Syracuse University’s Maxwell School of Citizenship and Public Affairs, where he also serves as director of the university’s Autonomous Systems Policy Institute. Dr. Ekbia is the founding director of the Academic Alliance for AI Policy.


By Dr. Hamid Ekbia

Current concerns among artists, musicians, and other creative professions about the threats of generative AI are valid, but somewhat misplaced. They are valid because there is a real risk for the products of human creativity to be misused by those who do not seem to have much respect for human labor of any kind — whether it is the creative labor of artists and musicians, the productive labor of farm and factory workers, or the affective labor of housewives and other caregivers. The concerns are misplaced, however, because they are often framed as AI taking the fruits of our labor away from us. This type of thinking gives too much agency to machines, as if they have their own minds and agendas. As a matter of fact, it is not AI systems that are doing this; rather those who currently have monopolistic control over data and computing resources.

Recently, in these very pages, William Meade quoted Scott Galloway, who said, “the real threat is not studios using AI, but studios being replaced by AI.” The threat, in other words, comes from Big Tech platforms inserting themselves between producers and consumers of music. I agree with the overall thrust of this argument, although I would phrase it somewhat differently, using a term that I coined a decade ago: heteromation. The idea behind heteromation, in contrast to automation, is that computing technologies such as AI do not work on their own; rather they largely work by benefiting from human labor and creativity, not because they are smarter than us but because their proprietors get a free ride from all of us in the form of heteromated labor — that is, a kind of labor that doesn’t get recognized or rewarded because it is not considered a job, work, or employment.

As anthropologist Bonnie Nardi and I showed in our book, heteromated labor takes different forms — all the way from the work that you do when you check yourself in or out at the airport, hotel, or supermarket to the content that you post on social media, the messages that you exchange with your family and friends, or even when walking with your cellphone in your pocket or listening to radio on smart speakers such as Alexa. In all these cases, you are unknowingly providing training data for visible and invisible computing systems. Generative AI is the most recent incarnation of this, benefiting from thousands of years of human thought, language, and creativity, as well as any content that we currently produce in the form of music, art, writing, and day-to-day communication. This much, I hope, is clear and undeniable, despite the rhetoric of the Big Tech that attribute these to the marvels of intelligent machines and algorithms.

When it comes to music, however, there is even a bigger threat — a double whammy, if you wish, in the form of “heteromated homogeny.” We often see a derivative of the word homogeny — “homogenized” — on the labels of modern dairy products such as milk and yogurt, where diverse elements are blended into a mixture that is the same throughout. I want to show that the emergence of Generative AI creates a real possibility for our musical creations to also become homogenized in the above sense, giving rise in the long run to what we can call “vanilla music.” Our modern palates have developed a liking for homogenized dairy products. The question is, are we going to develop a taste for homogenized musical products too?

To approach this question, we need to step back and take a quick look at the history of computer music. An early model of computer composition was David Cope’s Experiments in Musical Intelligence (EMI), nicknamed Emmy. Imagine feeding in Beethoven’s nine symphonies to a machine, and it coming out with Beethoven’s Tenth. That was almost what Emmy was all about. Initially conceived in 1981, Emmy moved on to produce music in the style of such revered musicians as Bach, Bartok, Brahms, Chopin, Gershwin, Joplin, Mozart, Prokoviev, Rachmaninov, Stravinsky, and David Cope himself — thousands of pieces, and with a quality that the most sophisticated musicians found impressive and “deceitful.” In a set of auditions in the 1990s, for instance, students and faculty at reputable music schools such as Eastman, Indiana and Juilliard listened to the music composed by either these great composers or by Emmy in their style, and they were not able to guess the provenance with better than a chance than probability.

Baffled by these observations, music aficionados tried hard to make sense of what was happening. “I was truly shaken,” my mentor and good friend Douglas Hofstadter described his first encounter with an Emmy mazurka in the style of Chopin:

“I was impressed, for the piece seemed to express something. . . . It was nostalgic, had a bit of Polish feeling to it, and it did not seem in any way plagiarized. It was new, it was unmistakably Chopin-like in spirit, and it was not emotionally empty… How could emotional music be coming out of a program that had never heard a note, never lived a moment of life, never had any emotions whatsoever?”

One doesn’t need to be a musician to recognize the formidable nature of these questions. Cope, an accomplished composer and musician himself, provides the beginnings of an answer. Very briefly, the basic idea behind Emmy is what Cope calls “recombinant music” — finding recurrent structures of various kinds in a composer’s music, and reusing those structures in new arrangements, so as to construct a new piece in the same style. Being given a set of input pieces (usually by a single composer and belonging to the same general form, such as mazurka), Emmy chops up and reassembles them in a principled and coherent way. The guiding principles in this process are very similar to those followed by someone solving a jigsaw puzzle — namely, to simultaneously observe the local fit of each piece with other pieces and the global pattern of the whole picture. Hofstadter calls these “syntactic and semantic meshing,” as they deal, respectively, with form and content. In addition, in each composition Emmy incorporates signatures — a characteristic intervallic pattern that recurs throughout a composer’s oeuvre — as well as another sophisticated mechanism that captures and manipulates repeated motifs at different levels of the input pieces.

We don’t exactly know how Generative AI systems produce music, and how similar they are to Emmy, but we do know the general algorithms and techniques that are used by the
convoluted neural networks behind these systems. Such techniques are generic and universal, regardless of the type of content generated — prose, poetry, music, video, or any domain that lends itself to syntactic and semantic meshing. The fundamental principle is for the algorithm to produce a “reasonable extension” of whatever output it has gotten so far. The keyword here is “reasonable,” which in our case means what we should expect a specific composer to do given what we know about all the music that they have produced in the past. Rather than doing this literally, however, the algorithm generates a list of “tokens” ranked according to the probability of their occurrence. The tokens can be as short as a single note or as long as a whole line of music. There is an extra twist, however, that partly explains the elusive “magic” of these systems: Rather than selecting the token with the highest probability every time and risking the production of a “mechanical” replica of earlier pieces, the algorithm throws in an element of randomness, choosing tokens with lower ranking some of the time. How random and how often this happens can be tweaked by a parameter called “temperature,” which determines how “wild” the algorithm will behave in deviating from the original input.

This is how generative AI systems go around the issue of “style,” capturing it in statistical patterns hidden within the structure of music (or language, for that matter). The situation, one might think, is not dissimilar to the musical show on NPR called the “piano puzzler,” where a popular piece is played in the “style” of a classical composer, challenging the contestant to identify both the piece and the composer. There is a key difference, however, in that in recognizing the style of the piece the contestants draw not only on the structure (syntax) of the music, but also on a vast repertoire of knowledge and understanding of the historical period, the music of the earlier eras and regions that inspired the composer in question, the trajectory of the development of their oeuvre, the particular moment and experiences in the composer’s life themselves (a war, a romance, the loss of a child or beloved, etc.), and so on and so forth. These are the kinds of “information” that are lost, by and large, in AI systems. How much of them are, and can be potentially, captured in databases is an open empirical question that we cannot answer at the moment because we still don’t have a widely agreeable theory of style. To best of our understanding, there is a qualitative and irreducible difference between these and the types of data that current computing systems can capture and encode.

This brings us back to the receiving end of music — namely, the audiences whose taste and response to the changing musical scene is largely a product of habits, life experiences, enculturation, commercialization, and other parameters beyond pure style. There is, therefore, a threat that these tastes and habits get shaped by a kind of algorithmic music that will continue to repeat, albeit with some randomness, what human composers have produced in the past. Human composers, it might be contended, also repeat their own work or that of others. “Good composers borrow,” Stravinsky has famously said, “great composers steal.” The question is, what is exactly that AI systems are doing: borrowing, stealing, or perhaps something totally different. This is where we get very different answers from those who think about such questions. David Cope is quoted to believe strongly that computers can be creative, while Douglas Hofstadter takes comfort in realizing that computers do not generate styles of their own; rather they depend on mimicking prior composers. But, as he hastens to add, “that is still not much comfort, [because] to what extent is music composed of ‘riffs,’ as jazz people say?”

Again, we are left with a formidable question. Unfortunately, we don’t also have a good theory of taste either. And in the absence of that, the safest approach would be to minimize the risks by creating a social environment that will cultivate human creativity, while benefiting from advances in technology. Such an environment will recognize the cultural and economic value of the creative products of human beings, provide legal protection for them, along with the resources to experiment with the capabilities of modern technology for the benefit of the creators and the public consumer, rather than big corporations. To go back to the dairy analogy, trends in recent times demonstrate a growing demand for non-homogenized products — for instance, yoghurt with cream at the top. Perhaps there is a lesson here for the music industry, where the cream of the crop should be allowed to rise to the surface in order to counter the push by Big Tech toward heteromated homogeny.

E-mail any feedback about Local 802’s AI series to allegro@local802afm.org.


MORE ARTICLES IN THIS SERIES:

“Artful” Intelligence

“How are you going to stop AI from stealing our jobs?”

Unleashing Creativity