A deep learning pipeline for classifying audio samples using TensorFlow and Keras.
Deep Learning Audio Classification
This project is a deep learning pipeline for classifying audio samples (e.g., clap, hat, kick, snare) using TensorFlow and Keras. It includes scripts for data loading, preprocessing, training, evaluation, prediction, and visualization.
data_loader.py: Loads and prepares audio datasets for training and validation.preprocessing.py: Preprocesses audio data for model input.model.py: Defines the neural network architecture.train.py: Trains the model on the dataset.evaluate.py: Evaluates model performance.predict.py: Makes predictions on new audio samples.view_spectrograms.py: Visualizes audio spectrograms.requirements.txt: Lists required Python packages.dataset/: Contains labeled audio files for training.new_sounds/: Contains new audio samples for prediction.
(Recommended) Create and activate a virtual environment:
python3.12 -m venv venv
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtTip: Folder names will be the labels for classification
-
Make sure the files are in
.wavformat,16bit,44.1kHz, and mono channelTip: You can ask an AI tool like ChatGPT or Copilot to provide you a script to batch convert your audio files to the required format using
ffmpegor similar tools (That's what I did)
-
This will create a
saved_model.kerasfile in the project directory. -
You can specify the number of epochs and batch size by modifying the
train.pyanddata_loader.pyfiles, respectively.Tip: Higher batch sizes can speed up training but may require more memory. Adjust based on your hardware capabilities
-
This will print the accuracy and loss of the model on the validation set.
Make sure you have Python 12 (e.g., 3.12.8) installed as TensorFlow requires version 12 for compatibility. You can install it with pyenv (Mac: brew install pyenv, Ubuntu: curl https://pyenv.run | bash)
# Install pyenv, then:
pyenv install 3.12.8
pyenv global 3.12.8Prepare your dataset in the dataset/ folder (organized by class) or use the provided sample dataset in this repository.
The dataset should be structured as follows:
dataset/
├── clap/
│ ├── clap_001.wav
│ ├── clap_002.wav
│ └── ...
├── hat/
│ ├── hat_001.wav
│ └── ...
├── kick/
│ ├── kick_001.wav
│ └── ...
└── snare/
├── snare_001.wav
└── ...Train the model:
python train.pyMake predictions on new audio samples:
python predict.py /path/to/audio.wav
# e.g. python predict.py new_sounds/clap.wavEvaluate the model's performance on the validation set:
python evaluate.pyFor fun I added a script to visualize the spectrograms of the audio samples:
python view_spectrograms.py path/to/audio.wav
# e.g. python view_spectrograms.py dataset/clap/clap_001.wav
This will display the waveform and spectrogram of the audio sample using matplotlib.
from https://github.com/emanuelefavero/deep-learning-audio

No comments:
Post a Comment