Here we apply the time-frequency techniques in audio processing to design a specialized neural network for audio. The first type of neural network is based on Fourier transform and involves complex number operations. The complex derivatives are computed by Wirtinger calculus. To aviod the complexity of complex numbers and restrict the computations in the real domain, we further introduce the discrete cosine transform architecture into the neural network design.
Figure 1. Basic Structures of DFT and DCT neural networks- DFT Layer
- IDFT Layer
- DCT Layer
- IDCT Layer
The training converges to the DCT (or DFT) basis:
Figure 2. Training Result of Angular Frequency Parameters Theta (Phi = Theta) Figure 3. Deep Discrete Fourier Transform Neural Network- Input Layer: DFT Layer
- Hidden layers*: Fourier Transform Layer (stable under certain conditions)
- Output Layer: IDFT Layer
- Input Layer: DCT Layer
- Hidden layers*: Cosine Transform Layer
- Output Layer: IDCT Layer