Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting (ICLR 2023)
This is the origin Pytorch implementation of Crossformer: Transformer Utilizing Cross-Dimension Dependency for Multivariate Time Series Forecasting.
1. Dimension-Segment-Wise (DSW) Embedding
Figure 1. DSW embedding. Left: Embedding method of previous Transformer-based model: data points in different dimensions at the same step are embedded into a vector; Right: DSW embedding of Crossformer: in each dimension, nearby points over time form a segment for embedding.
2. Two-Stage Attention (TSA) Layer
Figure 2. TSA layer. Left: Overall structure: the 2D vector array goes through the Cross-Time Stage and Cross-Dimension Stage to get corresponding dependency; Middle: Directly using MSA in Cross-Dimension Stage to build the
3. Hierarchical Encoder-Decoder (HED)
Figure 3. HED. The encoder (left) uses TSA layer and segment merging to capture dependency at different scales; the decoder (right) makes the final prediction by forecasting at each scale and adding them up.
- Python 3.7.10
- numpy==1.20.3
- pandas==1.3.2
- torch==1.8.1
- einops==0.4.1
-
Put datasets to conduct experiments into folder
datasets/
. We have already putETTh1
andETTm1
into it.WTH
andECL
can be downloaded from https://github.com/zhouhaoyi/Informer2020.ILI
andTraffic
can be downloaded from https://github.com/thuml/Autoformer. Note that theWTH
we used in the paper is the one with 12 dimensions from Informer, not the one with 21 dimensions from Autoformer. -
To get results of Crossformer with
$T=168, \tau = 24, L_{seg} = 6$ on ETTh1 dataset, run:
python main_crossformer.py --data ETTh1 --in_len 168 --out_len 24 --seg_len 6 --itr 1
The model will be automatically trained and tested. The trained model will be saved in folder checkpoints/
and evaluated metrics will be saved in folder results/
.
- You can also evaluate a trained model by running:
python eval_crossformer.py --checkpoint_root ./checkpoints --setting_name Crossformer_ETTh1_il168_ol24_sl6_win2_fa10_dm256_nh4_el3_itr0
- To reproduce all results in the paper, run following scripts to get corresponding results:
bash scripts/ETTh1.sh
bash scripts/ETTm1.sh
bash scripts/WTH.sh
bash scripts/ECL.sh
bash scripts/ILI.sh
bash scripts/Traffic.sh
TBD
main_crossformer
is the entry point of our model. Here we describe its arguments in detail:
Parameter name | Description of parameter |
---|---|
data | The dataset name |
root_path | The root path of the data file (defaults to ./datasets/ ) |
data_path | The data file name (defaults to ETTh1.csv ) |
data_split | Train/Val/Test split, can be ratio (e.g. 0.7,0.1,0.2 ) or number (e.g. 16800,2880,2880 ), (defaults to 0.7,0.1,0.2 ) |
checkpoints | Location of to store the trained model (defaults to ./checkpoints/ ) |
in_len | Length of input/history sequence, i.e. |
out_len | Length of output/future sequence, i.e. |
seg_len | Length of each segment in DSW embedding, i.e. |
win_size | How many adjacent segments to be merged into one in segment merging of HED (defaults to 2) |
factor | Number of routers in Cross-Dimension Stage of TSA, i.e. |
data_dim | Number of dimensions of the MTS data, i.e. |
d_model | Dimension of hidden states, i.e. |
d_ff | Dimension of MLP in MSA (defaults to 512) |
n_heads | Num of heads in MSA (defaults to 4) |
e_layers | Num of encoder layers, i.e. |
dropout | The probability of dropout (defaults to 0.2) |
num_workers | The num_works of Data loader (defaults to 0) |
batch_size | The batch size for training and testing (defaults to 32) |
train_epochs | Train epochs (defaults to 20) |
patience | Early stopping patience (defaults to 3) |
learning_rate | The initial learning rate for the optimizer (defaults to 1e-4) |
lradj | Ways to adjust the learning rate (defaults to type1 ) |
itr | Experiments times (defaults to 1) |
save_pred | Whether to save the predicted results. If True, the predicted results will be saved in folder results in numpy array form. This will cost a lot time and memory for datasets with large False ). |
use_gpu | Whether to use gpu (defaults to True ) |
gpu | The gpu no, used for training and inference (defaults to 0) |
use_multi_gpu | Whether to use multiple gpus (defaults to False ) |
devices | Device ids of multile gpus (defaults to 0,1,2,3 ) |