Total Pageviews

Sunday, 7 December 2025

deep-learning-audio

 A deep learning pipeline for classifying audio samples using TensorFlow and Keras.

Deep Learning Audio Classification

This project is a deep learning pipeline for classifying audio samples (e.g., clap, hat, kick, snare) using TensorFlow and Keras. It includes scripts for data loading, preprocessing, training, evaluation, prediction, and visualization.

prediction output

Project Structure

  • data_loader.py: Loads and prepares audio datasets for training and validation.
  • preprocessing.py: Preprocesses audio data for model input.
  • model.py: Defines the neural network architecture.
  • train.py: Trains the model on the dataset.
  • evaluate.py: Evaluates model performance.
  • predict.py: Makes predictions on new audio samples.
  • view_spectrograms.py: Visualizes audio spectrograms.
  • requirements.txt: Lists required Python packages.
  • dataset/: Contains labeled audio files for training.
  • new_sounds/: Contains new audio samples for prediction.

Getting Started

Installation

(Recommended) Create and activate a virtual environment:

python3.12 -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Dataset Structure

  • Tip: Folder names will be the labels for classification

  • Make sure the files are in .wav format, 16bit, 44.1kHz, and mono channel

    Tip: You can ask an AI tool like ChatGPT or Copilot to provide you a script to batch convert your audio files to the required format using ffmpeg or similar tools (That's what I did)

Training the Model

  • training output training output 2
  • This will create a saved_model.keras file in the project directory.

  • You can specify the number of epochs and batch size by modifying the train.py and data_loader.py files, respectively.

    Tip: Higher batch sizes can speed up training but may require more memory. Adjust based on your hardware capabilities

Predictions

  • prediction output

    Tip: I have provided a new_sounds/ folder with some example audio files for testing

Evaluation

  • This will print the accuracy and loss of the model on the validation set.

Visualizing Spectrograms

  • Make sure you have Python 12 (e.g., 3.12.8) installed as TensorFlow requires version 12 for compatibility. You can install it with pyenv (Mac: brew install pyenv, Ubuntu: curl https://pyenv.run | bash)

    # Install pyenv, then:
    pyenv install 3.12.8
    pyenv global 3.12.8
  • Prepare your dataset in the dataset/ folder (organized by class) or use the provided sample dataset in this repository.

  • The dataset should be structured as follows:

    dataset/
        ├── clap/
        │   ├── clap_001.wav
        │   ├── clap_002.wav
        │   └── ...
        ├── hat/
        │   ├── hat_001.wav
        │   └── ...
        ├── kick/
        │   ├── kick_001.wav
        │   └── ...
        └── snare/
            ├── snare_001.wav
            └── ...
  • Train the model:

    python train.py
  • Make predictions on new audio samples:

    python predict.py /path/to/audio.wav
    # e.g. python predict.py new_sounds/clap.wav
  • Evaluate the model's performance on the validation set:

    python evaluate.py
  • For fun I added a script to visualize the spectrograms of the audio samples:

    python view_spectrograms.py path/to/audio.wav
    # e.g. python view_spectrograms.py dataset/clap/clap_001.wav
  • spectrogram output
  • This will display the waveform and spectrogram of the audio sample using matplotlib.

    from  https://github.com/emanuelefavero/deep-learning-audio

  • No comments:

    Post a Comment