What was supposed to be an “Easy Saturday” of modifying a script—from “News to Image” to “News to Music”—quickly turned into a challenge. The plan was simple: use Riffusion for music generation and create a smooth, automated workflow, much like how I’ve handled the news image pipeline in a previous post. However, things didn’t go as smoothly as expected.

The Riffusion Roadblock

At first, Riffusion seemed like the perfect fit for generating music from text prompts. I dove into the existing GitHub package, only to find it hadn’t been updated since 2022 and wasn’t actively maintained. After spending hours updating, downgrading, and trying different versions of NVIDIA drivers and dependencies, it became clear that Riffusion just wasn’t going to work as hoped. I ultimately set the tool—and the whole “News to Music” idea—aside.

Testing MusicGen

With Riffusion off the table, I turned to AudioCraft/MusicGen by Facebook. It worked, but not quite as expected. MusicGen generated some interesting snippets based on prompts, but they were limited to 10-second clips. While these clips were creative, they didn’t meet the goal of producing full, cohesive tracks.

What I Gained

Despite not achieving the original goal, today was a productive learning experience. I spent time refining my understanding of:

  • Python and PIP management.
  • Installing and configuring NVIDIA drivers and CUDA.
  • Navigating dependency issues.
  • And digging deeper into the world of AI-based music generation.

Though things didn’t go according to plan, each challenge pushed me to refine my skills and broaden my knowledge.

What’s Next?

Even though I had to put the “News to Music” script aside for now, the quest for a reliable, full-text-to-music tool continues. I’ll keep experimenting with different AI music generation options and work towards finding the ideal solution that integrates seamlessly with CUDA. But for now, I think I’ll enjoy the rest of my Saturday off.