Normalizing audio signal
I want to reliably convert both recorded audio (through microphone) and
processed audio (WAV file) to the same discretized representations in
Python using specgram.
My process is as follows:
get raw samples (read from file or stream from mic)
perform some normalization (???)
perform FFT with windowing to generate spectrogram (plotting freq vs. time
with amplitude peaks)
discretize peaks in audio then memorize
Basically, by the time I get to the last discretization process I want to
as reliably as possible come to the same value in freq/time/amplitude
space for same song.
My problem is how do I account for volume (ie, the amplitudes of the
samples) being different in recorded and WAV-read audio?
My options for normalization (maybe?):
Divide all samples in window by mean before FFT
Detrend all samples in window before FFT
Divide all samples in window by max amplitude sample value (sensitive to
noise and outliers) before FFT
Divide all amplitudes in spectrogram by mean
How should I tackle this problem? I have almost no signal processing
knowledge or experience.
No comments:
Post a Comment