整理了Pyspark进行机器学习的代码,其主要部分包括:
-
Pyspark中的基本操作
-
数据的读取
-
特征处理
-
模型训练
-
特征重要性展示
-
超参数搜索
-
模型保存
主要代码在MlDemo中.
主要参考:
http://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#
https://www.jianshu.com/p/4d7003182398
https://www.jianshu.com/p/20456b512fa7
https://blog.csdn.net/sinat_26917383/article/details/80500349
https://blog.csdn.net/qq_40587575/article/details/91170554
https://www.pythonheidong.com/blog/article/51724/
https://blog.csdn.net/sinat_36226553/article/details/104129182
https://www.cnblogs.com/SoftwareBuilding/p/9492285.html
https://blog.csdn.net/mergerly/article/details/77250098
https://blog.csdn.net/yeshang_lady/article/details/89710914
https://blog.csdn.net/yeshang_lady/article/details/87373280
https://www.jianshu.com/p/70f8c78a3fc9
https://stackoverflow.com/questions/38517808/create-a-dataframe-from-a-list-in-pyspark-sql