f2CBVx

ppt.cc/fKlBax ppt.cc/fwlgFx ppt.cc/fVjECx ppt.cc/fEnHsx ppt.cc/fRZTnx ppt.cc/fSZ3cx ppt.cc/fLOuCx ppt.cc/fE9Nux ppt.cc/fL5Kyx ppt.cc/f71Yqx tecmint.com linuxcool.com linux.die.net linux.it.net.cn ostechnix.com unix.com ubuntugeek.com runoob.com man.linuxde.net ppt.cc/fwpCex ppt.cc/fxcLIx ppt.cc/foX6Ux linuxprobe.com linuxtechi.com howtoforge.com linuxstory.org systutorials.com ghacks.net linuxopsys.com ppt.cc/ffAGfx ppt.cc/fJbezx ppt.cc/fNIQDx ppt.cc/fCSllx ppt.cc/fybDVx ppt.cc/fIMQxx

Sunday, 7 December 2025

deep-learning-audio

A deep learning pipeline for classifying audio samples using TensorFlow and Keras.

Deep Learning Audio Classification

This project is a deep learning pipeline for classifying audio samples (e.g., clap, hat, kick, snare) using TensorFlow and Keras. It includes scripts for data loading, preprocessing, training, evaluation, prediction, and visualization.

Project Structure

data_loader.py: Loads and prepares audio datasets for training and validation.
preprocessing.py: Preprocesses audio data for model input.
model.py: Defines the neural network architecture.
train.py: Trains the model on the dataset.
evaluate.py: Evaluates model performance.
predict.py: Makes predictions on new audio samples.
view_spectrograms.py: Visualizes audio spectrograms.
requirements.txt: Lists required Python packages.
dataset/: Contains labeled audio files for training.
new_sounds/: Contains new audio samples for prediction.

Getting Started

Installation

(Recommended) Create and activate a virtual environment:

python3.12 -m venv venv
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Dataset Structure

Tip: Folder names will be the labels for classification
Make sure the files are in .wav format, 16bit, 44.1kHz, and mono channel

Tip: You can ask an AI tool like ChatGPT or Copilot to provide you a script to batch convert your audio files to the required format using ffmpeg or similar tools (That's what I did)

Training the Model

This will create a saved_model.keras file in the project directory.
You can specify the number of epochs and batch size by modifying the train.py and data_loader.py files, respectively.

Tip: Higher batch sizes can speed up training but may require more memory. Adjust based on your hardware capabilities

Predictions

Tip: I have provided a new_sounds/ folder with some example audio files for testing

Evaluation

This will print the accuracy and loss of the model on the validation set.

Visualizing Spectrograms

Make sure you have Python 12 (e.g., 3.12.8) installed as TensorFlow requires version 12 for compatibility. You can install it with pyenv (Mac: brew install pyenv, Ubuntu: curl https://pyenv.run | bash)

# Install pyenv, then:
pyenv install 3.12.8
pyenv global 3.12.8

Prepare your dataset in the dataset/ folder (organized by class) or use the provided sample dataset in this repository.

The dataset should be structured as follows:

dataset/
    ├── clap/
    │   ├── clap_001.wav
    │   ├── clap_002.wav
    │   └── ...
    ├── hat/
    │   ├── hat_001.wav
    │   └── ...
    ├── kick/
    │   ├── kick_001.wav
    │   └── ...
    └── snare/
        ├── snare_001.wav
        └── ...

Train the model:

python train.py

Make predictions on new audio samples:

python predict.py /path/to/audio.wav
# e.g. python predict.py new_sounds/clap.wav

Evaluate the model's performance on the validation set:

python evaluate.py

For fun I added a script to visualize the spectrograms of the audio samples:

python view_spectrograms.py path/to/audio.wav
# e.g. python view_spectrograms.py dataset/clap/clap_001.wav

This will display the waveform and spectrogram of the audio sample using matplotlib.

from https://github.com/emanuelefavero/deep-learning-audio