Skip to content

Commit

Permalink
format the reference (#225)
Browse files Browse the repository at this point in the history
  • Loading branch information
YanjieGao authored Sep 19, 2022
1 parent 9949293 commit f07815e
Show file tree
Hide file tree
Showing 16 changed files with 175 additions and 91 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,11 @@ NAS 搜索出的网络结构在某些任务上甚至可以达到媲美人类专


1. [A100](https://www.nvidia.com/en-us/data-center/a100/)

2. [TensorCore](https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/)

3. Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. [Distilling the knowledge in a neural network.](https://arxiv.org/pdf/1503.02531.pdf) arXiv preprint arXiv:1503.02531 2.7 (2015).

4. Howard, Andrew G., et al. [Mobilenets: Efficient convolutional neural networks for mobile vision applications.](https://arxiv.org/pdf/1704.04861.pdf) arXiv preprint arXiv:1704.04861 (2017).

5. Deng, Lei, et al. [Model compression and hardware acceleration for neural networks: A comprehensive survey.](https://ieeexplore.ieee.org/abstract/document/9043731) Proceedings of the IEEE 108.4 (2020): 485-532.
Original file line number Diff line number Diff line change
Expand Up @@ -167,16 +167,25 @@ $$APoZ^{(i)}_c = APoZ(O_c^{(i)}) = \frac{\sum_k^N \sum_j^M f(O^{(i)}_{c,j}(k=0))

## 参考文献

- Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, 2008, 31(2): 210-227.
- 纪荣嵘,林绍辉,晁飞,吴永坚,黄飞跃.深度神经网络压缩与加速综述.计算机研究与发展,2018,55(09):1871-1888.
- Hoefler T, Alistarh D, Ben-Nun T, et al. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 2021, 22(241): 1-124.
- Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.
- Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. Proceedings of the IEEE international conference on computer vision. 2017: 2736-2744.
- Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
- Ren M, Pokrovsky A, Yang B, et al. Sbnet: Sparse blocks network for fast inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8711-8720.
- Aji A F, Heafield K. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.
- Lin Y, Han S, Mao H, et al. Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887, 2017.
- Deng L, Li G, Han S, et al. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE, 2020, 108(4): 485-532.
1. Wright J, Yang A Y, Ganesh A, et al. Robust face recognition via sparse representation. IEEE transactions on pattern analysis and machine intelligence, 2008, 31(2): 210-227.

2. 纪荣嵘,林绍辉,晁飞,吴永坚,黄飞跃.深度神经网络压缩与加速综述.计算机研究与发展,2018,55(09):1871-1888.

3. Hoefler T, Alistarh D, Ben-Nun T, et al. Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks. Journal of Machine Learning Research, 2021, 22(241): 1-124.

4. Li H, Kadav A, Durdanovic I, et al. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710, 2016.

5. Liu Z, Li J, Shen Z, et al. Learning efficient convolutional networks through network slimming. Proceedings of the IEEE international conference on computer vision. 2017: 2736-2744.

6. Hu H, Peng R, Tai Y W, et al. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.

7. Ren M, Pokrovsky A, Yang B, et al. Sbnet: Sparse blocks network for fast inference. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8711-8720.

8. Aji A F, Heafield K. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021, 2017.

9. Lin Y, Han S, Mao H, et al. Deep gradient compression: Reducing the communication bandwidth for distributed training. arXiv preprint arXiv:1712.01887, 2017.

10. Deng L, Li G, Han S, et al. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proceedings of the IEEE, 2020, 108(4): 485-532.



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -115,5 +115,6 @@ TPU的推理芯片中很早就使用了INT8,在后续的训练芯片中也采

## 参考文献

- https://www.nvidia.com/en-us/data-center/a100/
- https://www.jiqizhixin.com/articles/2019-06-25-18
1. https://www.nvidia.com/en-us/data-center/a100/

2. https://www.jiqizhixin.com/articles/2019-06-25-18
Original file line number Diff line number Diff line change
Expand Up @@ -26,5 +26,6 @@

## 参考文献

- Docker (software). [https://en.wikipedia.org/wiki/Docker_(software)](https://en.wikipedia.org/wiki/Docker_(software))
- Kubernetes. [https://en.wikipedia.org/wiki/Kubernetes](https://en.wikipedia.org/wiki/Kubernetes)
1. Docker (software). [https://en.wikipedia.org/wiki/Docker_(software)](https://en.wikipedia.org/wiki/Docker_(software))

2. Kubernetes. [https://en.wikipedia.org/wiki/Kubernetes](https://en.wikipedia.org/wiki/Kubernetes)
Original file line number Diff line number Diff line change
Expand Up @@ -125,14 +125,24 @@ Resource Central 的架构如上图所示,它包含线下(Offline)和 客

## 参考文献

- Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat Prabhat, and Ryan P. Adams. 2015. [*Scalable Bayesian optimization using deep neural networks*](https://dl.acm.org/doi/10.5555/3045118.3045349). In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15).
- Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. [*Hey, You Have Given Me Too Many Knobs!: Understanding and Dealing with Over-Designed Configuration in System Software*](https://dl.acm.org/doi/10.1145/2786805.2786852). In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '15). Association for Computing Machinery.
- Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. [*Neural Adaptive Video Streaming with Pensieve*](https://dl.acm.org/doi/10.1145/3098822.3098843). In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). Association for Computing Machinery.
- Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. [*Automatic Database Management System Tuning Through Large-scale Machine Learning*](https://dl.acm.org/doi/10.1145/3035918.3064029). In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery.
- Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. [*CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics*](https://dl.acm.org/doi/10.5555/3154630.3154669). In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association.
- Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. [*The Case for Learned Index Structures*](https://dl.acm.org/doi/10.1145/3183713.3196909). In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery.
- Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. [*Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms*](https://dl.acm.org/doi/abs/10.1145/3132747.3132772). In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). Association for Computing Machinery.
- Zhao Lucis Li, Chieh-Jan Mike Liang, Wenjia He, Lianjie Zhu, Wenjun Dai, Jin Jiang, and Guangzhong Sun. 2018. [*Metis: Robustly Optimizing Tail Latencies of Cloud Systems*](https://dl.acm.org/doi/10.5555/3277355.3277449). In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (ATC '18). USENIX Association.
- Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. [*ALEX: An Updatable Adaptive Learned Index*](https://doi.org/10.1145/3318464.3389711). In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery.
- Mirhoseini, A., Goldie, A., Yazgan, M. et al. A graph placement methodology for fast chip design. Nature 594, 207–212 (2021). https://doi.org/10.1038/s41586-021-03544-w
- Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. [*APEX: a high-performance learned index on persistent memory*](https://doi.org/10.14778/3494124.3494141). Proc. VLDB Endow.
1. Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Prabhat Prabhat, and Ryan P. Adams. 2015. [*Scalable Bayesian optimization using deep neural networks*](https://dl.acm.org/doi/10.5555/3045118.3045349). In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (ICML'15).

2. Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasupathy, and Rukma Talwadker. 2015. [*Hey, You Have Given Me Too Many Knobs!: Understanding and Dealing with Over-Designed Configuration in System Software*](https://dl.acm.org/doi/10.1145/2786805.2786852). In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE '15). Association for Computing Machinery.

3. Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. [*Neural Adaptive Video Streaming with Pensieve*](https://dl.acm.org/doi/10.1145/3098822.3098843). In Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). Association for Computing Machinery.

4. Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. [*Automatic Database Management System Tuning Through Large-scale Machine Learning*](https://dl.acm.org/doi/10.1145/3035918.3064029). In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). Association for Computing Machinery.

5. Omid Alipourfard, Hongqiang Harry Liu, Jianshu Chen, Shivaram Venkataraman, Minlan Yu, and Ming Zhang. 2017. [*CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics*](https://dl.acm.org/doi/10.5555/3154630.3154669). In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association.

6. Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. [*The Case for Learned Index Structures*](https://dl.acm.org/doi/10.1145/3183713.3196909). In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). Association for Computing Machinery.

7. Eli Cortez, Anand Bonde, Alexandre Muzio, Mark Russinovich, Marcus Fontoura, and Ricardo Bianchini. 2017. [*Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms*](https://dl.acm.org/doi/abs/10.1145/3132747.3132772). In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP '17). Association for Computing Machinery.

8. Zhao Lucis Li, Chieh-Jan Mike Liang, Wenjia He, Lianjie Zhu, Wenjun Dai, Jin Jiang, and Guangzhong Sun. 2018. [*Metis: Robustly Optimizing Tail Latencies of Cloud Systems*](https://dl.acm.org/doi/10.5555/3277355.3277449). In Proceedings of the 2018 USENIX Conference on Usenix Annual Technical Conference (ATC '18). USENIX Association.

9. Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, and Tim Kraska. 2020. [*ALEX: An Updatable Adaptive Learned Index*](https://doi.org/10.1145/3318464.3389711). In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (SIGMOD '20). Association for Computing Machinery.

10. Mirhoseini, A., Goldie, A., Yazgan, M. et al. A graph placement methodology for fast chip design. Nature 594, 207–212 (2021). https://doi.org/10.1038/s41586-021-03544-w

11. Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. [*APEX: a high-performance learned index on persistent memory*](https://doi.org/10.14778/3494124.3494141). Proc. VLDB Endow.
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@

## 参考文献

- D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. [*Hidden Technical Debt in Machine Learning Systems*](https://dl.acm.org/doi/10.5555/2969442.2969519). In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press.
- Zhao Lucis Li, Chieh-Jan Mike Liang, Wei Bai, Qiming Zheng, Yongqiang Xiong, and Guangzhong Sun. 2019. [*Accelerating Rule-matching Systems with Learned Rankers*](https://www.usenix.org/conference/atc19/presentation/li-zhao). In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (ATC '19). USENIX Association.
- Chieh-Jan Mike Liang, Hui Xue, Mao Yang, Lidong Zhou, Lifei Zhu, Zhao Lucis Li, Zibo Wang, Qi Chen, Quanlu Zhang, Chuanjie Liu, and Wenjun Dai. 2020. [*AutoSys: The Design and Operation of Learning-Augmented Systems*](https://dl.acm.org/doi/abs/10.5555/3489146.3489168). In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC '20). USENIX Association.
1. D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. [*Hidden Technical Debt in Machine Learning Systems*](https://dl.acm.org/doi/10.5555/2969442.2969519). In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS'15). MIT Press.

2. Zhao Lucis Li, Chieh-Jan Mike Liang, Wei Bai, Qiming Zheng, Yongqiang Xiong, and Guangzhong Sun. 2019. [*Accelerating Rule-matching Systems with Learned Rankers*](https://www.usenix.org/conference/atc19/presentation/li-zhao). In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (ATC '19). USENIX Association.

3. Chieh-Jan Mike Liang, Hui Xue, Mao Yang, Lidong Zhou, Lifei Zhu, Zhao Lucis Li, Zibo Wang, Qi Chen, Quanlu Zhang, Chuanjie Liu, and Wenjun Dai. 2020. [*AutoSys: The Design and Operation of Learning-Augmented Systems*](https://dl.acm.org/doi/abs/10.5555/3489146.3489168). In Proceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference (ATC '20). USENIX Association.
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,17 @@ $.
请读者思考,除了矩阵乘法之外,还有哪些你认为深度学习模型中常用到的计算模式或算子呢? 这些算子在不同的硬件平台上是否有较好的软件库支持呢?

## 参考文献
- https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html
- https://developer.nvidia.com/cublas
- https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution
- https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html
- https://en.wikipedia.org/wiki/Recurrent_neural_network
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
- [Attention Is All You Need](https://arxiv.org/abs/1706.03762)

1. https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html

2. https://developer.nvidia.com/cublas

3. https://en.wikipedia.org/wiki/Toeplitz_matrix#Discrete_convolution

4. https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html

5. https://en.wikipedia.org/wiki/Recurrent_neural_network

6. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)

7. [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
Original file line number Diff line number Diff line change
Expand Up @@ -124,9 +124,15 @@ for (int i = 0; i < M; i++) {
思考:请列举一些你能想到的其它CPU体系结构特点,以及思考这些特点对矩阵乘法甚至其它运算带来的好处和影响。

## 参考文献
- https://en.wikipedia.org/wiki/Computer_architecture
- https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
- https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
- https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html
- https://developer.nvidia.com/cublas
- https://pytorch.org/docs/master/notes/extending.html

1. https://en.wikipedia.org/wiki/Computer_architecture

2. https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

3. https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

4. https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-mkl-for-dpcpp/top.html

5. https://developer.nvidia.com/cublas

6. https://pytorch.org/docs/master/notes/extending.html
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,10 @@ cd .. && python mnist_custom_linear_cuda.py

## 参考文献

- https://en.wikipedia.org/wiki/Graphics_processing_unit
- CUDA Programming model: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
- An Even Easier Introduction to CUDA: https://devblogs.nvidia.com/even-easier-introduction-cuda/
- CUSTOM C++ AND CUDA EXTENSIONS: https://pytorch.org/tutorials/advanced/cpp_extension.html
1. https://en.wikipedia.org/wiki/Graphics_processing_unit

2. CUDA Programming model: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

3. An Even Easier Introduction to CUDA: https://devblogs.nvidia.com/even-easier-introduction-cuda/

4. CUSTOM C++ AND CUDA EXTENSIONS: https://pytorch.org/tutorials/advanced/cpp_extension.html
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,18 @@ C = t.compute((m, n),

## 参考文献

- https://gcc.gnu.org/
- The LLVM Compiler Infrastructure: https://llvm.org/
- LLVM IR and Go: https://blog.gopheracademy.com/advent-2018/llvm-ir-and-go/
- TVM. https://tvm.apache.org/
- [Ansor : Generating High-Performance Tensor Programs for Deep Learning](https://arxiv.org/abs/2006.06762)
- https://github.com/facebookresearch/TensorComprehensions
- https://en.wikipedia.org/wiki/Graphics_processing_unit
- https://cloud.google.com/tpu
1. https://gcc.gnu.org/

2. The LLVM Compiler Infrastructure: https://llvm.org/

3. LLVM IR and Go: https://blog.gopheracademy.com/advent-2018/llvm-ir-and-go/

4. TVM. https://tvm.apache.org/

5. [Ansor : Generating High-Performance Tensor Programs for Deep Learning](https://arxiv.org/abs/2006.06762)

6. https://github.com/facebookresearch/TensorComprehensions

7. https://en.wikipedia.org/wiki/Graphics_processing_unit

8. https://cloud.google.com/tpu
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,12 @@

## 参考文献

- https://en.wikipedia.org/wiki/Optimizing_compiler
- https://en.wikipedia.org/wiki/Common_subexpression_elimination
- https://en.wikipedia.org/wiki/Constant_folding
- TensorFlow Graph Optimizations:https://www.tensorflow.org/guide/graph_optimization
- Graph Optimizations in ONNX Runtimes: https://onnxruntime.ai/docs/performance/graph-optimizations.html
1. https://en.wikipedia.org/wiki/Optimizing_compiler

2. https://en.wikipedia.org/wiki/Common_subexpression_elimination

3. https://en.wikipedia.org/wiki/Constant_folding

4. TensorFlow Graph Optimizations:https://www.tensorflow.org/guide/graph_optimization

5. Graph Optimizations in ONNX Runtimes: https://onnxruntime.ai/docs/performance/graph-optimizations.html
Loading

0 comments on commit f07815e

Please sign in to comment.