Skip to main content
PROMPT SPACE
D
$10.00developer-toolsUniversal

describe-rename-sound-files

SoundTag AI: Automatically describe and batch-rename audio files based on their actual sound using local ML or Gemini AI.

skill install https://www.promptspace.in/skills/describe-rename-sound-files

SoundTag AI -Listens & Renames Your Sound Files

What it does

This skill solves the problem of messy, auto-generated audio filenames like audio_track_v2_final_99.wav. It analyzes the actual content of sound files and renames them with human-readable, descriptive titles such as ElevenLabs_2024-07-21T15_43_56_George_pre_s50_sb75_se0_b_m2.mp3 → Elevenlabs_George_Voice_Speech.mp3 or Bright_Trumpet_Fanfare.wav or Large_Crowd_Cheering.mp3.

Supported tools

  • Local ML (AST): Uses the MIT Audio Spectrogram Transformer to classify sounds into 527 categories (Speech, Music, Explosion, etc.) entirely offline.
  • Google Gemini API: Leverages advanced multimodal AI for nuanced descriptions of cinematic SFX, moods, and complex textures.
  • Batch Processing: Supports .wav, .mp3, .ogg, .flac, .aac, .m4a, and more.

Why use this skill

Unlike simple prompting, this skill implements a sophisticated two-step workflow. It first attempts a high-speed local classification to save on API costs and privacy. For ambiguous sounds, it provides a structured "improvement pass" using Gemini. It intelligently combines ML labels with hidden hints from the original filename to ensure context is never lost. It handles environment constraints automatically, including specific dependency versions (Torch/Transformers) to fit within sandboxed resource limits.

Output

The result is a clean, organized directory where every sound file follows a consistent Title_Case_With_Underscores naming convention, making your sample libraries and field recordings instantly searchable.

Use cases

  • Identify specific instruments to rename generic music project tracks
  • Convert cryptic field recording names into descriptive environmental labels
  • Organize voiceover exports by speaker name and performance style
  • Batch-process sound effect libraries using AI-generated content tags

Example

Prompt

Batch rename the audio files in my Rename folder based on their sound using the local AST model.

Sample output preview is available after purchase.

Known limitations

Requires Python 3.8+ installed. Model download is ~350MB on first run. Works best on clearly identifiable sounds — abstract/cinematic SFX may need the optional Gemini enhancement step. Processes first 10 seconds of each file.

Frequently asked questions

The skill primarily uses a local Audio Spectrogram Transformer (AST) model that runs entirely within your sandbox environment. This ensures your audio files never leave your system, though an optional Gemini API mode is available for more complex descriptions if you provide a key.
describe-rename-sound-files — AI Agent Skill | PromptSpace