Skip to content

Commit

Permalink
feat: change E2E AutoML dataset to use bank marketing data
Browse files Browse the repository at this point in the history
* Change E2E AutoML dataset to use bank marketing data

* Remove vs code config
  • Loading branch information
coolalexzb authored Oct 25, 2022
1 parent 2f5fe80 commit fbcf783
Showing 1 changed file with 20 additions and 60 deletions.
80 changes: 20 additions & 60 deletions notebooks/official/automl/automl_tabular_on_vertex_pipelines.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@
"source": [
"### Dataset\n",
"\n",
"The dataset you will be using is the [Safe Driver Prediction](https://www.kaggle.com/competitions/porto-seguro-safe-driver-prediction/data?select=train.csv) dataset for predicting the probability of an auto insurance policy holder filing a claim for a given incident."
"The dataset you will be using is [Bank Marketing](https://archive.ics.uci.edu/ml/datasets/bank+marketing).\n",
"The data is for direct marketing campaigns (phone calls) of a Portuguese banking institution. The binary classification goal is to predict if a client will subscribe a term deposit. For this notebook, we randomly selected 90% of the rows in the original dataset and saved them in a train.csv file hosted on Cloud Storage. To download the file, click [here](https://storage.googleapis.com/cloud-samples-data/vertex-ai/tabular-workflows/datasets/bank-marketing/train.csv)."
]
},
{
Expand Down Expand Up @@ -638,9 +639,9 @@
"root_dir = os.path.join(BUCKET_URI, \"automl_tabular_pipeline\")\n",
"prediction_type = \"classification\"\n",
"optimization_objective = \"minimize-log-loss\"\n",
"target_column = \"target\"\n",
"target_column = \"deposit\"\n",
"data_source_csv_filenames = (\n",
" \"gs://cloud-samples-data/vertex-ai/tabular-workflows/datasets/safe-driver/train.csv\"\n",
" \"gs://cloud-samples-data/vertex-ai/tabular-workflows/datasets/bank-marketing/train.csv\"\n",
")\n",
"data_source_bigquery_table_path = None # format: bq://bq_project.bq_dataset.bq_table\n",
"\n",
Expand All @@ -659,63 +660,22 @@
"weight_column = None\n",
"\n",
"features = [\n",
" \"ps_ind_01\",\n",
" \"ps_ind_02_cat\",\n",
" \"ps_ind_03\",\n",
" \"ps_ind_04_cat\",\n",
" \"ps_ind_05_cat\",\n",
" \"ps_ind_06_bin\",\n",
" \"ps_ind_07_bin\",\n",
" \"ps_ind_08_bin\",\n",
" \"ps_ind_09_bin\",\n",
" \"ps_ind_10_bin\",\n",
" \"ps_ind_11_bin\",\n",
" \"ps_ind_12_bin\",\n",
" \"ps_ind_13_bin\",\n",
" \"ps_ind_14\",\n",
" \"ps_ind_15\",\n",
" \"ps_ind_16_bin\",\n",
" \"ps_ind_17_bin\",\n",
" \"ps_ind_18_bin\",\n",
" \"ps_reg_01\",\n",
" \"ps_reg_02\",\n",
" \"ps_reg_03\",\n",
" \"ps_car_01_cat\",\n",
" \"ps_car_02_cat\",\n",
" \"ps_car_03_cat\",\n",
" \"ps_car_04_cat\",\n",
" \"ps_car_05_cat\",\n",
" \"ps_car_06_cat\",\n",
" \"ps_car_07_cat\",\n",
" \"ps_car_08_cat\",\n",
" \"ps_car_09_cat\",\n",
" \"ps_car_10_cat\",\n",
" \"ps_car_11_cat\",\n",
" \"ps_car_11\",\n",
" \"ps_car_12\",\n",
" \"ps_car_13\",\n",
" \"ps_car_14\",\n",
" \"ps_car_15\",\n",
" \"ps_calc_01\",\n",
" \"ps_calc_02\",\n",
" \"ps_calc_03\",\n",
" \"ps_calc_04\",\n",
" \"ps_calc_05\",\n",
" \"ps_calc_06\",\n",
" \"ps_calc_07\",\n",
" \"ps_calc_08\",\n",
" \"ps_calc_09\",\n",
" \"ps_calc_10\",\n",
" \"ps_calc_11\",\n",
" \"ps_calc_12\",\n",
" \"ps_calc_13\",\n",
" \"ps_calc_14\",\n",
" \"ps_calc_15_bin\",\n",
" \"ps_calc_16_bin\",\n",
" \"ps_calc_17_bin\",\n",
" \"ps_calc_18_bin\",\n",
" \"ps_calc_19_bin\",\n",
" \"ps_calc_20_bin\",\n",
" \"age\",\n",
" \"job\",\n",
" \"marital\",\n",
" \"education\",\n",
" \"default\",\n",
" \"balance\",\n",
" \"housing\",\n",
" \"loan\",\n",
" \"contact\",\n",
" \"day\",\n",
" \"month\",\n",
" \"duration\",\n",
" \"campaign\",\n",
" \"pdays\",\n",
" \"previous\",\n",
" \"poutcome\",\n",
"]\n",
"transformations = generate_auto_transformation(features)\n",
"transform_config_path = os.path.join(root_dir, f\"transform_config_{uuid.uuid4()}.json\")\n",
Expand Down

0 comments on commit fbcf783

Please sign in to comment.