⚙️AlgorithmIntermediate

Short-Time Fourier Transform (STFT)

Key Points

•
The Short-Time Fourier Transform (STFT) breaks a signal into small overlapping windows and computes a Fourier transform on each window to reveal how frequencies evolve over time.
•
You control time–frequency trade-offs with the window length: longer windows give better frequency resolution but blur timing, and shorter windows do the opposite.
•
A window function (like the Hann window) reduces spectral leakage by tapering the edges of each frame before the FFT.
•
The hop size determines how much adjacent frames overlap and thus affects redundancy, reconstruction quality, and computational cost.
•
A spectrogram is the magnitude (often log-scaled) of the STFT across time and frequency bins.
•
The inverse STFT uses overlap-add with proper normalization (often with window-squared sums) to reconstruct the original signal.
•
Efficient STFT implementations rely on the FFT, giving roughly O(M K log K) time where M is the number of frames and K is the FFT size.
•
Using consistent windowing, hop size, and normalization avoids common reconstruction errors such as gain loss, gain boost, or audible artifacts.

Prerequisites

→Discrete Fourier Transform (DFT) and FFT — STFT uses the FFT per frame to compute spectra; understanding DFT bins, magnitude, and phase is essential.
→Complex numbers and Euler's formula — STFT coefficients are complex and rely on e^{j\theta} to represent sinusoids.
→Sampling theory — Relates discrete indices to physical frequencies and avoids aliasing by choosing K and Fs appropriately.
→Windowing and spectral leakage — Explains why and how windows reduce leakage and how they affect resolution.
→Convolution and overlap-add — Inverse STFT relies on overlap-add concepts and linearity to reconstruct signals.

Detailed Explanation

Tap terms for definitions

01Overview

The Short-Time Fourier Transform (STFT) is a technique to analyze how the frequency content of a signal changes over time. Instead of applying a single Fourier transform to the entire signal—which would only reveal which frequencies are present overall—the STFT slides a window across the signal, taking short snapshots (frames). For each frame, it multiplies the samples by a window function to reduce edge effects, then computes a Fourier transform. The result is a set of complex spectra indexed by time (frame index) and frequency bin, producing a time–frequency representation known as a spectrogram when magnitudes are visualized. The STFT is fundamental in audio, speech, and non-stationary signal processing because most real-world signals change over time. By tuning parameters such as window length, hop size (step between frames), FFT size (often a power of two), and window type (e.g., Hann, Hamming), one can balance frequency resolution, time resolution, and computational cost. Proper synthesis (inverse STFT) with overlap-add and normalization can reconstruct the original signal from its STFT, enabling transformative applications like noise reduction, pitch shifting, and time–frequency masking.

02Intuition & Analogies

Imagine listening to a song and pausing it every tiny fraction of a second to ask: “What notes are sounding right now?” If you looked at the entire song at once with a regular Fourier transform, you’d only learn which notes appear somewhere in the whole track, not when. The STFT is like using a small, sliding listening window that you place over the track step by step. At each position, you analyze just that slice to find its current mix of notes. Stitching all those analyses together gives a moving picture—like a musical score over time—called a spectrogram. The window length is the size of your listening snapshot: a long snapshot lets you tell apart very close pitches (good frequency resolution) but blurs quick events (poor time resolution). A short snapshot makes fast changes crisp (good time resolution) but lumps together nearby pitches (poor frequency resolution). The window function is like gently fading the volume at the edges of each snapshot; this avoids sharp cuts that would create spurious frequencies (spectral leakage). The hop size is how far you move the window each time—small hops mean more overlap and smoother time tracking but more computation. To get back the original sound, you reverse the process: convert each small spectrum back to time samples, fade them with the same window, and overlap-add them. If your fades and overlaps are balanced, the slices recombine seamlessly, just like layering translucent photos to recreate the full scene.

03Formal Definition

Given a discrete-time signal x[n], a window w[n] of length N, and hop size H, the discrete STFT is defined for frame index m and frequency bin k as: X[m, k] =

\sum_{n = 0}^{N - 1}

x[n + mH] w[n]

e^{- j 2 πkn / K}

, where K is the FFT size (typically K

\geq

N). Here,

k = 0

, 1,

\dots

, K-1, and

m = 0

, 1,

\dots

, M-1 with M chosen such that frames cover the signal (often using zero-padding beyond the signal bounds). The complex value X[m, k] encodes amplitude and phase of the k-th frequency at the m-th frame. The magnitude

∣ X [m, k] ∣

is used for spectrogram visualization; the phase

ar g

(X[m, k]) is crucial for accurate reconstruction and for phase-sensitive processing. A common window is the Hann window w[n] = 0.5 - 0.5

\cos\left

(

\frac{2\pi n}{N-1}\right

). For reconstruction, overlap-add synthesis with the same window and hop H uses the inverse DFT per frame and normalizes by the cumulative window-squared sum:

\overset{x}{^}

[n] =

\frac

{

\sum_{m}

w[n - mH]

\cdot

Re

iDFT

(X[m, :])\}[n - mH]}{

\sum_{m}

w^{2}

[n - mH]} (with appropriate indexing). Under standard conditions (e.g., appropriate window and hop),

\overset{x}{^}

[n] equals x[n] up to numerical error.

04When to Use

Use STFT whenever you need to understand or manipulate how frequency content evolves over time in non-stationary signals. Typical applications include speech analysis (formant tracking, voice activity detection), music information retrieval (onset detection, pitch estimation, beat tracking), audio effects (time–stretching, pitch shifting, spectral morphing), noise reduction and speech enhancement (spectral subtraction, masks), bio-signals (EEG/ECG event localization), and mechanical diagnostics (time–frequency fault signatures). Choose STFT over a single global Fourier transform when stationarity does not hold over the entire signal. Prefer longer windows when discerning closely spaced frequencies matters (e.g., harmonic analysis in steady notes) and shorter windows when timing precision is critical (e.g., transients and percussive onsets). Use smaller hop sizes for smoother spectrograms and better reconstruction at the cost of more computation, and larger hop sizes for efficiency when coarse time resolution suffices. If very different time scales coexist (e.g., chirps and transients), consider multi-resolution methods (wavelets) or multi-window STFTs, but the classic STFT is often the first, most practical tool.

⚠️Common Mistakes

• Using a rectangular window implicitly (i.e., no taper) and then being surprised by spectral leakage—use tapered windows like Hann or Hamming unless you need the narrowest main lobe and accept sidelobes. • Mismatch between FFT size K and window length N without proper zero-padding or indexing, causing frame misalignment or aliasing; ensure K \ge N and pad the windowed frame to length K. • Incorrect overlap-add normalization in inverse STFT: forgetting to divide by the accumulated window-squared sum leads to amplitude gain or loss and audible pulsation; always track a denominator array of \sum w^2. • Choosing hop size H too large for the chosen window, breaking perfect reconstruction (e.g., Hann with H = N/2 is common; H = N can fail). • Ignoring phase: manipulating magnitudes only and then reusing original or zero phases can cause artifacts; when possible, preserve or properly estimate phase (e.g., Griffin–Lim for phase reconstruction from magnitudes). • Not handling boundaries: failing to pad the signal so that frames at the end are truncated can produce clicks or zeroed tails; use zero-padding or reflect-padding as appropriate. • Misinterpreting frequency bins: forgetting that for real-valued signals, bins above K/2 are conjugate-symmetric; often only K/2+1 unique bins are needed. • Using linear magnitude for visualization can hide quiet details; log-power spectrograms (in dB) are more interpretable.

Key Formulas

Discrete STFT

X [m, k] = n = 0 \sum N - 1 x [n + m H] w [n] e^{- j 2 πkn / K}

Explanation: This sums a windowed slice of the signal multiplied by complex sinusoids to get the spectrum at frame m and bin k. K is the FFT size, typically a power of two and at least N.

Inverse STFT (overlap-add)

\overset{x}{^} [n] = \frac{\sum _{m} w [ n - m H ] Re { iDFT ( X [ m , : ])} [ n - m H ]}{\sum _{m} w ^{2} [ n - m H ]}

Explanation: Each frame is inverted back to time domain, re-windowed, and overlap-added. Dividing by the accumulated window-squared sum removes the effect of windowing to recover the original amplitude.

Hann window

w_{Hann} [n] = \frac{1}{2} - \frac{1}{2} cos (\frac{2 πn}{N - 1}), 0 \leq n < N

Explanation: The Hann window smoothly tapers frame edges to reduce leakage. It supports perfect reconstruction with hop sizes like H = N/2 under standard overlap-add normalization.

Frequency bin mapping

f_{k} = \frac{k}{K} F_{s}, 0 \leq k < K

Explanation: This maps discrete bin index k to physical frequency in Hertz given sampling rate $F_{s}$ . Adjacent bins are spaced by $F_{s}$ /K Hz.

Number of frames

M \approx ⌈ \frac{L - N}{H} ⌉ + 1

Explanation: For a signal of length L, window length N, and hop H, this estimates how many frames are computed. Zero-padding can adjust the effective M near boundaries.

Time complexity

T_{STFT} = O (M \cdot K lo g K)

Explanation: Each frame performs an FFT of size K, giving K log K per frame. Multiplying by M frames yields the total time complexity.

Time–frequency uncertainty

σ_{t} σ_{f} \geq \frac{1}{4 π}

Explanation: This inequality states a lower bound on the product of time and frequency spreads. It formalizes the trade-off between time and frequency resolution.

Log-magnitude spectrogram

S [m, k] = 20 lo g_{10} (∣ X [m, k] ∣ + ε)

Explanation: Converts linear magnitudes to decibels for better dynamic range visualization. A small epsilon avoids log of zero.

Complexity Analysis

Let L be the signal length, N the window length, H the hop size, and K the FFT size (

K \geq N

, often a power of two). The number of frames is M ≈ ⌈(L − N)/H⌉ + 1, assuming zero-padding at the tail if needed. Each frame requires O(N) time to window samples and O(K log K) time to compute the FFT (multiplications dominate). Thus, total time is O(M (N + K log K)). When K is on the order of N, the FFT term dominates and we typically write O(M K log K). If you store the whole spectrogram, memory is O(M · (K/2 + 1)) for real signals (only unique bins) plus O(K) temporary buffers for the FFT. In streaming settings, you can keep only the last frame and a small overlap buffer, reducing working memory to O(K). The inverse STFT carries similar costs: per frame, an inverse FFT costs O(K log K) and overlap-add takes O(N), with an additional O(N) to accumulate window-squared weights. Numerical constants matter: smaller hops increase M and therefore runtime; larger K improves frequency resolution but raises cost superlinearly due to the log factor. Cache-friendly FFT implementations (iterative, in-place) and precomputing window values help. If you compute only magnitude or only the first K/2+1 bins for real inputs, you can halve storage and some computation (with real-FFT variants). Parallelizing across frames or frequency bins can further reduce wall-clock time since frames are independent until synthesis.

Code Examples

Compute STFT Magnitude Spectrogram with Hann Window and FFT

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 // Iterative in-place radix-2 Cooley-Tukey FFT
5 // If invert=false: computes FFT; if invert=true: computes inverse FFT (scaled by 1)
6 // After inverse, we divide by n externally to get true inverse DFT.
7 void fft(vector<complex<double>>& a, bool invert) {
8     int n = (int)a.size();
9     static vector<int> rev;
10     static vector<complex<double>> roots{0, 1};
11 
12     if ((int)rev.size() != n) {
13         int k = __builtin_ctz(n);
14         rev.assign(n, 0);
15         for (int i = 0; i < n; i++) {
16             rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
17         }
18     }
19     if ((int)roots.size() < n) {
20         int k = __builtin_ctz(roots.size());
21         roots.resize(n);
22         while ((1 << k) < n) {
23             double ang = 2 * M_PI / (1 << (k + 1));
24             for (int i = 1 << (k - 1); i < (1 << k); i++) {
25                 roots[2 * i] = roots[i];
26                 double angle = ang * (2 * i + 1 - (1 << k));
27                 roots[2 * i + 1] = complex<double>(cos(angle), sin(angle));
28             }
29             k++;
30         }
31     }
32 
33     for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]);
34 
35     for (int len = 1; len < n; len <<= 1) {
36         for (int i = 0; i < n; i += 2 * len) {
37             for (int j = 0; j < len; j++) {
38                 complex<double> u = a[i + j];
39                 complex<double> v = a[i + j + len] * roots[len + j];
40                 a[i + j] = u + v;
41                 a[i + j + len] = u - v;
42             }
43         }
44     }
45     if (invert) {
46         // Conjugate, run forward FFT, then conjugate and scale outside (common trick)
47         for (auto &x : a) x = conj(x);
48         fft(a, false);
49         for (auto &x : a) x = conj(x) / (double)n;
50     }
51 }
52 
53 vector<double> hann_window(int N) {
54     vector<double> w(N);
55     if (N == 1) { w[0] = 1.0; return w; }
56     for (int n = 0; n < N; ++n) {
57         w[n] = 0.5 - 0.5 * cos(2.0 * M_PI * n / (N - 1));
58     }
59     return w;
60 }
61 
62 // Compute magnitude spectrogram |X[m,k]| for k=0..K/2
63 vector<vector<double>> stft_magnitude(const vector<double>& x, int N, int H, int K) {
64     int L = (int)x.size();
65     if ((K & (K - 1)) != 0) throw runtime_error("K must be power of two");
66     if (N > K) throw runtime_error("Window length N must be <= K");
67 
68     vector<double> w = hann_window(N);
69     int M = (L <= N) ? 1 : (int)ceil((double)(L - N) / H) + 1;
70     vector<vector<double>> S; // M x (K/2+1)
71     S.reserve(M);
72 
73     vector<complex<double>> frame(K);
74     for (int m = 0; m < M; ++m) {
75         int start = m * H;
76         // Build windowed frame with zero-padding to K
77         for (int n = 0; n < K; ++n) {
78             double sample = 0.0;
79             if (n < N) {
80                 int idx = start + n;
81                 if (0 <= idx && idx < L) sample = x[idx];
82                 sample *= w[n];
83             }
84             frame[n] = complex<double>(sample, 0.0);
85         }
86         // Forward FFT
87         fft(frame, false);
88         // Magnitude for unique bins 0..K/2
89         vector<double> mag(K / 2 + 1);
90         for (int k = 0; k <= K / 2; ++k) mag[k] = abs(frame[k]);
91         S.push_back(move(mag));
92     }
93     return S;
94 }
95 
96 int main() {
97     // Example: analyze a 1-second 440 Hz + 880 Hz tone at Fs=8000 Hz
98     int Fs = 8000;
99     int L = Fs * 1; // 1 second
100     vector<double> x(L);
101     for (int n = 0; n < L; ++n) {
102         x[n] = 0.7 * sin(2 * M_PI * 440 * n / Fs) + 0.3 * sin(2 * M_PI * 880 * n / Fs);
103     }
104 
105     int N = 1024;   // window length
106     int H = 512;    // hop size (50% overlap)
107     int K = 1024;   // FFT size
108 
109     auto S = stft_magnitude(x, N, H, K);
110 
111     // Print a small portion of the spectrogram: first 5 frames, first 10 bins
112     cout << fixed << setprecision(3);
113     for (int m = 0; m < min((int)S.size(), 5); ++m) {
114         cout << "Frame " << m << ": ";
115         for (int k = 0; k < min((int)S[m].size(), 10); ++k) {
116             cout << S[m][k] << (k + 1 < 10 ? ", " : "\n");
117         }
118     }
119     return 0;
120 }
121

This program computes an STFT magnitude spectrogram using a Hann window and an in-place radix-2 FFT. It pads each windowed frame to size K, runs the FFT, and keeps only the unique bins (0..K/2) because the input is real. The printed values give a glimpse of the spectrogram; in practice, you would visualize S as an image, often on a log scale.

Time: O(M (N + K log K)) ≈ O(M K log K)Space: O(K + M (K/2+1))

Inverse STFT (ISTFT) with Overlap-Add and Proper Normalization

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 void fft(vector<complex<double>>& a, bool invert) {
5     int n = (int)a.size();
6     static vector<int> rev;
7     static vector<complex<double>> roots{0, 1};
8     if ((int)rev.size() != n) {
9         int k = __builtin_ctz(n);
10         rev.assign(n, 0);
11         for (int i = 0; i < n; i++) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
12     }
13     if ((int)roots.size() < n) {
14         int k = __builtin_ctz(roots.size());
15         roots.resize(n);
16         while ((1 << k) < n) {
17             double ang = 2 * M_PI / (1 << (k + 1));
18             for (int i = 1 << (k - 1); i < (1 << k); i++) {
19                 roots[2 * i] = roots[i];
20                 double angle = ang * (2 * i + 1 - (1 << k));
21                 roots[2 * i + 1] = complex<double>(cos(angle), sin(angle));
22             }
23             k++;
24         }
25     }
26     for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]);
27     for (int len = 1; len < n; len <<= 1) {
28         for (int i = 0; i < n; i += 2 * len) {
29             for (int j = 0; j < len; j++) {
30                 complex<double> u = a[i + j];
31                 complex<double> v = a[i + j + len] * roots[len + j];
32                 a[i + j] = u + v;
33                 a[i + j + len] = u - v;
34             }
35         }
36     }
37     if (invert) {
38         for (auto &x : a) x = conj(x);
39         fft(a, false);
40         for (auto &x : a) x = conj(x) / (double)n;
41     }
42 }
43 
44 vector<double> hann_window(int N) {
45     vector<double> w(N);
46     if (N == 1) { w[0] = 1.0; return w; }
47     for (int n = 0; n < N; ++n) w[n] = 0.5 - 0.5 * cos(2.0 * M_PI * n / (N - 1));
48     return w;
49 }
50 
51 // Build a test STFT by forward transform, then reconstruct via ISTFT
52 int main() {
53     int Fs = 16000;
54     int L = Fs * 1; // 1 second
55     vector<double> x(L);
56     for (int n = 0; n < L; ++n) {
57         // A chirp from 400 Hz to 2000 Hz
58         double t = (double)n / Fs;
59         double f = 400.0 + 1600.0 * t;
60         x[n] = 0.8 * sin(2 * M_PI * f * t);
61     }
62 
63     int N = 1024, H = 256, K = 1024;
64     if ((K & (K - 1)) != 0) throw runtime_error("K must be power of two");
65     vector<double> w = hann_window(N);
66 
67     // Forward STFT: store full complex spectrum per frame
68     int M = (L <= N) ? 1 : (int)ceil((double)(L - N) / H) + 1;
69     vector<vector<complex<double>>> X(M, vector<complex<double>>(K));
70     vector<complex<double>> buf(K);
71     for (int m = 0; m < M; ++m) {
72         int start = m * H;
73         for (int n = 0; n < K; ++n) {
74             double s = 0.0;
75             if (n < N) {
76                 int idx = start + n;
77                 if (0 <= idx && idx < L) s = x[idx];
78                 s *= w[n];
79             }
80             buf[n] = complex<double>(s, 0.0);
81         }
82         fft(buf, false);
83         X[m] = buf; // store complex spectrum
84     }
85 
86     // Inverse STFT via overlap-add with window-squared normalization
87     vector<double> y(L + K, 0.0);          // room for tail
88     vector<double> weight(L + K, 0.0);     // accumulate w^2
89 
90     for (int m = 0; m < M; ++m) {
91         // IFFT of frame m
92         buf = X[m];
93         fft(buf, true); // now buf[n] is time-domain frame of length K
94         for (int n = 0; n < N; ++n) {
95             int idx = m * H + n;
96             double val = buf[n].real();
97             double win = w[n];
98             y[idx] += val * win;       // re-window on synthesis
99             weight[idx] += win * win;  // accumulate window-squared
100         }
101     }
102 
103     // Normalize to undo windowing and overlap
104     for (size_t i = 0; i < y.size(); ++i) {
105         if (weight[i] > 1e-12) y[i] /= weight[i];
106     }
107 
108     // Trim to original length
109     y.resize(L);
110 
111     // Report reconstruction error (RMS)
112     double err2 = 0.0, x2 = 0.0;
113     for (int i = 0; i < L; ++i) {
114         err2 += (y[i] - x[i]) * (y[i] - x[i]);
115         x2 += x[i] * x[i];
116     }
117     double rms = sqrt(err2 / L);
118     double snr = 10 * log10(x2 / max(1e-20, err2));
119     cout << fixed << setprecision(6);
120     cout << "RMS error: " << rms << "\nSNR (dB): " << snr << "\n";
121     return 0;
122 }
123

This program builds an STFT (complex spectra per frame) and reconstructs the signal using inverse FFT per frame, windowing, and overlap-add. It accumulates the window-squared denominator to normalize the result, ensuring proper gain. The final RMS error and SNR indicate near-perfect reconstruction when parameters are consistent.

Time: O(M (N + K log K)) for forward + O(M (N + K log K)) for inverseSpace: O(MK) to store spectra, O(K) working, O(L) for output and weights

Streaming STFT: Process Incoming Samples with a Ring Buffer

1 #include <bits/stdc++.h>
2 using namespace std;
3 
4 void fft(vector<complex<double>>& a, bool invert) {
5     int n = (int)a.size();
6     static vector<int> rev;
7     static vector<complex<double>> roots{0, 1};
8     if ((int)rev.size() != n) {
9         int k = __builtin_ctz(n);
10         rev.assign(n, 0);
11         for (int i = 0; i < n; i++) rev[i] = (rev[i >> 1] >> 1) | ((i & 1) << (k - 1));
12     }
13     if ((int)roots.size() < n) {
14         int k = __builtin_ctz(roots.size());
15         roots.resize(n);
16         while ((1 << k) < n) {
17             double ang = 2 * M_PI / (1 << (k + 1));
18             for (int i = 1 << (k - 1); i < (1 << k); i++) {
19                 roots[2 * i] = roots[i];
20                 double angle = ang * (2 * i + 1 - (1 << k));
21                 roots[2 * i + 1] = complex<double>(cos(angle), sin(angle));
22             }
23             k++;
24         }
25     }
26     for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]);
27     for (int len = 1; len < n; len <<= 1) {
28         for (int i = 0; i < n; i += 2 * len) {
29             for (int j = 0; j < len; j++) {
30                 complex<double> u = a[i + j];
31                 complex<double> v = a[i + j + len] * roots[len + j];
32                 a[i + j] = u + v;
33                 a[i + j + len] = u - v;
34             }
35         }
36     }
37     if (invert) {
38         for (auto &x : a) x = conj(x);
39         fft(a, false);
40         for (auto &x : a) x = conj(x) / (double)n;
41     }
42 }
43 
44 vector<double> hann_window(int N) {
45     vector<double> w(N);
46     if (N == 1) { w[0] = 1.0; return w; }
47     for (int n = 0; n < N; ++n) w[n] = 0.5 - 0.5 * cos(2.0 * M_PI * n / (N - 1));
48     return w;
49 }
50 
51 int main() {
52     int N = 512, H = 128, K = 512;
53     vector<double> w = hann_window(N);
54 
55     // Simulated input stream: a 1 kHz tone at Fs=8000 Hz
56     int Fs = 8000, L = Fs / 2; // 0.5 s
57     vector<double> input(L);
58     for (int n = 0; n < L; ++n) input[n] = sin(2 * M_PI * 1000 * n / Fs);
59 
60     // Ring buffer for latest N samples
61     vector<double> ring(N, 0.0);
62     int head = 0; // next position to write
63     int samples_until_frame = H; // trigger STFT every H samples
64 
65     vector<complex<double>> frame(K);
66 
67     for (int n = 0; n < L; ++n) {
68         // Push sample into ring buffer
69         ring[head] = input[n];
70         head = (head + 1) % N;
71         samples_until_frame--;
72 
73         if (samples_until_frame == 0) {
74             // Build frame in correct time order, apply window, zero-pad (K=N here)
75             for (int i = 0; i < N; ++i) {
76                 int idx = (head + i) % N; // oldest-to-newest
77                 double s = ring[idx] * w[i];
78                 frame[i] = complex<double>(s, 0.0);
79             }
80             for (int i = N; i < K; ++i) frame[i] = 0.0;
81             // FFT
82             fft(frame, false);
83             // Example: print energy in 1 kHz bin (nearest bin index)
84             int k = int(round(1000.0 * K / Fs));
85             double mag = abs(frame[k]);
86             cout << fixed << setprecision(3) << "Frame energy near 1 kHz: " << mag << "\n";
87             samples_until_frame = H; // reset for next frame
88         }
89     }
90     return 0;
91 }
92

This example simulates real-time STFT by maintaining a ring buffer of the most recent N samples and triggering an FFT every H samples. It applies a Hann window and computes the FFT without storing the full spectrogram, then prints the magnitude near a target frequency, which is typical in streaming analyzers.

Time: Per frame: O(N + K log K); streaming total scales with number of framesSpace: O(N + K) working memory; no spectrogram storage

1	#include <bits/stdc++.h>
2	using namespace std;
3
4	// Iterative in-place radix-2 Cooley-Tukey FFT
5	// If invert=false: computes FFT; if invert=true: computes inverse FFT (scaled by 1)
6	// After inverse, we divide by n externally to get true inverse DFT.
7	void fft(vector<complex<double>>& a, bool invert) {
8	int n = (int)a.size();
9	static vector<int> rev;
10	static vector<complex<double>> roots{0, 1};
11
12	if ((int)rev.size() != n) {
13	int k = __builtin_ctz(n);
14	rev.assign(n, 0);
15	for (int i = 0; i < n; i++) {
16	rev[i] = (rev[i >> 1] >> 1) \| ((i & 1) << (k - 1));
17	}
18	}
19	if ((int)roots.size() < n) {
20	int k = __builtin_ctz(roots.size());
21	roots.resize(n);
22	while ((1 << k) < n) {
23	double ang = 2 * M_PI / (1 << (k + 1));
24	for (int i = 1 << (k - 1); i < (1 << k); i++) {
25	roots[2 * i] = roots[i];
26	double angle = ang * (2 * i + 1 - (1 << k));
27	roots[2 * i + 1] = complex<double>(cos(angle), sin(angle));
28	}
29	k++;
30	}
31	}
32
33	for (int i = 0; i < n; i++) if (i < rev[i]) swap(a[i], a[rev[i]]);
34
35	for (int len = 1; len < n; len <<= 1) {
36	for (int i = 0; i < n; i += 2 * len) {
37	for (int j = 0; j < len; j++) {
38	complex<double> u = a[i + j];
39	complex<double> v = a[i + j + len] * roots[len + j];
40	a[i + j] = u + v;
41	a[i + j + len] = u - v;
42	}
43	}
44	}
45	if (invert) {
46	// Conjugate, run forward FFT, then conjugate and scale outside (common trick)
47	for (auto &x : a) x = conj(x);
48	fft(a, false);
49	for (auto &x : a) x = conj(x) / (double)n;
50	}
51	}
52
53	vector<double> hann_window(int N) {
54	vector<double> w(N);
55	if (N == 1) { w[0] = 1.0; return w; }
56	for (int n = 0; n < N; ++n) {
57	w[n] = 0.5 - 0.5 * cos(2.0 * M_PI * n / (N - 1));
58	}
59	return w;
60	}
61
62	// Compute magnitude spectrogram \|X[m,k]\| for k=0..K/2
63	vector<vector<double>> stft_magnitude(const vector<double>& x, int N, int H, int K) {
64	int L = (int)x.size();
65	if ((K & (K - 1)) != 0) throw runtime_error("K must be power of two");
66	if (N > K) throw runtime_error("Window length N must be <= K");
67
68	vector<double> w = hann_window(N);
69	int M = (L <= N) ? 1 : (int)ceil((double)(L - N) / H) + 1;
70	vector<vector<double>> S; // M x (K/2+1)
71	S.reserve(M);
72
73	vector<complex<double>> frame(K);
74	for (int m = 0; m < M; ++m) {
75	int start = m * H;
76	// Build windowed frame with zero-padding to K
77	for (int n = 0; n < K; ++n) {
78	double sample = 0.0;
79	if (n < N) {
80	int idx = start + n;
81	if (0 <= idx && idx < L) sample = x[idx];
82	sample *= w[n];
83	}
84	frame[n] = complex<double>(sample, 0.0);
85	}
86	// Forward FFT
87	fft(frame, false);
88	// Magnitude for unique bins 0..K/2
89	vector<double> mag(K / 2 + 1);
90	for (int k = 0; k <= K / 2; ++k) mag[k] = abs(frame[k]);
91	S.push_back(move(mag));
92	}
93	return S;
94	}
95
96	int main() {
97	// Example: analyze a 1-second 440 Hz + 880 Hz tone at Fs=8000 Hz
98	int Fs = 8000;
99	int L = Fs * 1; // 1 second
100	vector<double> x(L);
101	for (int n = 0; n < L; ++n) {
102	x[n] = 0.7 * sin(2 * M_PI * 440 * n / Fs) + 0.3 * sin(2 * M_PI * 880 * n / Fs);
103	}
104
105	int N = 1024; // window length
106	int H = 512; // hop size (50% overlap)
107	int K = 1024; // FFT size
108
109	auto S = stft_magnitude(x, N, H, K);
110
111	// Print a small portion of the spectrogram: first 5 frames, first 10 bins
112	cout << fixed << setprecision(3);
113	for (int m = 0; m < min((int)S.size(), 5); ++m) {
114	cout << "Frame " << m << ": ";
115	for (int k = 0; k < min((int)S[m].size(), 10); ++k) {
116	cout << S[m][k] << (k + 1 < 10 ? ", " : "\n");
117	}
118	}
119	return 0;
120	}
121