Came across this jaw-dropping project on Hacker News: Riffusion. It is trending #1 on Hacker News today.
It is a project that uses AI image generation tools to generate a spectrogram based on a description of music, which is then fed into a tool that reads the spectrogram from an image and plays the music.
The idea is simple:
Think of music you want to hear and describe it in words.
Use one of the recent AI image generation tools to generate a spectrogram based on that description.
Feed the image into a tool that reads a spectrogram from an image and plays the music.
While the idea sounds really simple, the natural thought would be:
There’s no way it will work!
But let the naysayers go to their graves. It works!
What is spectrogram of an audio file?
A spectrogram is a visual representation of the spectrum of frequencies in an audio file. It is a graph that shows how the frequency of the sound changes over time. It can be used to identify different sounds and to analyze the characteristics of a sound. An example of a spectrogram is:
AI based image generation tools have become incredibly accurate over the last year, with vast improvements in the quality of images they are able to produce. This is due to the increase in computing power available, as well as the development of new algorithms that are able to make better use of the data available to them. They have become so accurate that they can now be used to create realistic images that would have been difficult or impossible to create using traditional software tools. This has opened up a whole new range of possibilities (and problems?) for content creators and businesses, as they are now able to produce high quality images at a fraction of the cost and time that was previously required.
Rippling effects of technology
Riffusion is a prime example of how a rapid advancement in technology can have far-reaching implications that extend beyond what we initially thought possible. Automated image generation, for instance, has seen an incredible surge in accuracy and ease of use within a short span of months (literally!), enabling anyone to create stunning visuals with minimal effort. Now, this same technology is being used for music generation, allowing for the creation of complex audio arrangements with the same level of precision and simplicity.
What domains will be disrupted next?