【メモ】技術英語1-テキストから音楽へのアプリにAIを使用(feat. seq2seq)

2023年8月24日 20:37

With the rise in popularity of Large Language Models (LLMs) and generative AI tools like ChatGPT, developers have found use cases to mold text in different ways for use cases ranging from writing emails to summarizing articles. Now, they are looking to help you generate bits of music by just typing some words.

Brett Bauman, the developer of PlayListAI (previously LinupSupply), launched a new app called Songburst on the App Store this week. The app doesn’t have a steep learning curve. You just have to type in a prompt like “Calming piano music to listen to while studying” or “Funky beats for a podcast intro” to let the app generate a music clip.

If you can’t think of a prompt the app has prompts in different categories, including video, lo-fi, podcast, gaming, meditation and sample.

UI for Songburst
Image Credits: Songburst

Bauman told TechCrunch that he built the back end of the app using Vercel and music is generated through Leap. Currently, there is a limitation of generating 30 seconds and some output might not be of great quality. Bauman said that over time he will look to increase the length of the generated music clip and improve quality.

Songburst is free to try but it offers a subscription at $9.99 per month or $79.99 per year. The subscription gives you 20 song credits per month and the ability to download tracks in the mp3 format. Users can also buy additional credits in packs of five ($7.99), 10 ($11.99) or 20 ($15.99).

Bauman said he built the app because there are few simple and mobile native text-to-music solutions around which there are not spammy tactics used to draw subscription money.

He’s not alone in trying to make a neat text-to-music app, however. Akhil Tolani, who has made apps like the music collaboration app Rapchat, has launched CassetteAI, which is available on the web and App Store both.

At the input level, CassetteAI works similarly to other apps. You type in a prompt for music and it churns out a track. However, it can generate a sample up to three minutes long. The app maker said this is because the app works on a custom model based on seq2seq hierarchal architecture and it is trained on a specialized dataset to generate copyright-free music.

Music generation interface for Cassatte AI
Image Credits: Cassette AI

The tool also provides an interface for users to create different versions of the generated tracks and edit and mix them to make a new track. These tools are pretty basic, so don’t expect to create a multilayered master track out of this just yet.

Cassette AI interface for mixing tracks
Cassette AI interface for mixing tracks. Image Credits: Cassette AI

Tolani said that the tool was operating on a waitlist basis, but he is opening it up to more people now. He told TechCrunch that he is also expecting a Cassette AI pro subscription priced at $4.99 per month, which will give users access to unlimited song generation and access to better-quality AI models for improved song generation.

The developer mentioned that Cassette AI is better than other music generators such as Mubert and Beatbot because it generates better-quality music with a quicker turnaround time. He added that with Cassette AI, he wants to respect the ethical boundaries of the music industry.

“We want people to see AI as a tool for music creation, not a replacement for creators: Calculators did not replace mathematicians, they just made it easier to calculate things. We want to make music production accessible to everyone for any use case,” he said.

These tools are mainly targeting creators, who can use copyright-free music in their videos or podcasts. The developers are also hoping that musicians notice their tools and blend them into their sample or song-making process.

Apart from indie developers, major tech companies are also taking a crack at the text-to-music generation problem. Google made its MusicLM tool public during the Google I/O developer conference in May. In June, Meta open sourced its own AI-powered music generator called MusicGen.

While models are improving when it comes to the quality of the generated tracks, there are concerns regarding the training data they use to create music. To avoid legal troubles, OpenAI has made its Jukebox model part open sourced and has banned users from creating music for commercial use cases. Then there are some AI-forward musicians like Grimes, who in April invited fans to make songs with her voice and split royalties with her.

https://techcrunch.com/2023/08/21/developers-are-now-using-ai-for-text-to-music-apps/

seq2seq?

「Sequence to sequence」(seq2seq)は、機械学習と自然言語処理分野で使用されるモデルアーキテクチャの一種を表します。このアーキテクチャは、主に 1 つのシーケンス（文章、単語、時系列データなど）を別のシーケンスに変換するタスクに使用されます。

seq2seqモデルは、2つの主要コンポーネントで構成されています:エンコーダ（Encoder）とデコーダ（Decoder）。

エンコーダ(Encoder):
エンコーダは、入力シーケンスを圧縮された表現に変換する役割を果たします。与えられた入力シーケンスの情報をよく要約した固定サイズのベクトルまたは隠れた状態に変換します。例えば、文章を入力として受け取り、その文章の意味を含んでいるベクトルに変換することができます。

デコーダ(Decoder):
デコーダは、エンコーダから取得した圧縮情報を活用して、目的の出力シーケンスを生成する役割を果たします。通常、デコーダは開始トークンとエンコーダから取得した情報を入力として受け取り、シーケンスの次の要素を生成するプロセスを繰り返します。これらのプロセスにより、出力シーケンスが作成されます。

seq2seqモデルは翻訳、チャットボット、文章要約、音楽生成など多様な自然言語処理作業に使用できます。このモデルを拡張して、階層構造やアテンションメカニズムを含む変形モデルも多く使用されます。

生計型エンジニア
ソンさん

この記事が気に入ったらサポートをしてみませんか？