Machine Learning

• Get raw audio data as a NumPy array.

import numpy as np
samples = np.array(audio.get_array_of_samples())

• Create a Pydub segment from a NumPy array.

new_audio = AudioSegment(
    samples.tobytes(),
    frame_rate=audio.frame_rate,
    sample_width=audio.sample_width,
    channels=audio.channels
)

• Read a WAV file directly into a NumPy array.

from scipy.io.wavfile import read
rate, data = read("sound.wav")

• Write a NumPy array to a WAV file.

from scipy.io.wavfile import write
write("new_sound.wav", rate, data)

• Generate a sine wave.

import numpy as np
sample_rate = 44100
frequency = 440 # A4 note
duration = 5
t = np.linspace(0., duration, int(sample_rate * duration))
amplitude = np.iinfo(np.int16).max * 0.5
data = amplitude * np.sin(2. * np.pi * frequency * t)
# This array can now be written to a file

VIII. Audio Analysis with Librosa

• Load audio with Librosa.

import librosa
y, sr = librosa.load("sound.mp3")

• Estimate tempo (Beats Per Minute).

tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

• Get beat event times in seconds.

beat_times = librosa.frames_to_time(beat_frames, sr=sr)

• Decompose into harmonic and percussive components.

y_harmonic, y_percussive = librosa.effects.hpss(y)

• Compute a spectrogram.

import numpy as np
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)

• Compute Mel-Frequency Cepstral Coefficients (MFCCs).

mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)

• Compute Chroma features (related to musical pitch).

chroma = librosa.feature.chroma_stft(y=y, sr=sr)

• Detect onset events (the start of notes).

onset_frames = librosa.onset.onset_detect(y=y, sr=sr)
onset_times = librosa.frames_to_time(onset_frames, sr=sr)

• Pitch shifting.

y_pitched = librosa.effects.pitch_shift(y, sr=sr, n_steps=4) # Shift up 4 semitones

• Time stretching (change speed without changing pitch).

y_fast = librosa.effects.time_stretch(y, rate=2.0) # Double speed

IX. More Utilities

• Detect leading silence.

from pydub.silence import detect_leading_silence
trim_ms = detect_leading_silence(audio)
trimmed_audio = audio[trim_ms:]

• Get the root mean square (RMS) energy.

rms = audio.rms

• Get the maximum possible RMS for the audio format.

max_possible_rms = audio.max_possible_amplitude

• Find the loudest section of an audio file.

from pydub.scipy_effects import normalize
loudest_part = normalize(audio.strip_silence(silence_len=1000, silence_thresh=-32))

• Change the frame rate (resample).

resampled = audio.set_frame_rate(16000)

• Create a simple band-pass filter.

from pydub.scipy_effects import band_pass_filter
filtered = band_pass_filter(audio, 400, 2000) # Pass between 400Hz and 2000Hz

• Convert file format in one line.

AudioSegment.from_file("music.ogg").export("music.mp3", format="mp3")

• Get the raw bytes of the audio data.

raw_data = audio.raw_data

• Get the maximum amplitude.

max_amp = audio.max

• Match the volume of two segments.

matched_audio2 = audio2.apply_gain(audio1.dBFS - audio2.dBFS)

#Python #AudioProcessing #Pydub #Librosa #SignalProcessing

━━━━━━━━━━━━━━━
By: @DataScienceM ✨

❤3

752 views10:57

About

Blog

Apps

Platform