Since 0.9.0-RC1, `T.Resample` precomputes and caches resampling kernel for performance improvement. (10x improvement).
The implementation from 0.8.0 computed the kernel on-the-fly on the same `device`/`dtype` as the input Tensor,
but in the newer version, the kernel is precomputed at the construction time and is cached with `float32` first.
This causes degradation if one wants to perform resampling on `float64`, because `sinc` values computed on `float32`s are not good enough for resampling in `float64`.
The reason why we decided to use `float32` for initial caching is to keep the UX disruption minimum, and there were no way to make it work for `float64`. This PR adds `dtype` argument, that can be used for overwriting the cache precision.