Skip to content

Commit

Permalink
cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
tcuongd committed Oct 17, 2023
1 parent ff673f9 commit b16d849
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 75 deletions.
50 changes: 21 additions & 29 deletions docs/_docs/additional_topics.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ m <- prophet(df, growth='flat')
# Python
m = Prophet(growth='flat')
```
Below is a comparison of counterfactual forecasting under two methods: using linear growth for a single time series, versus flat growth with an exogenous regressor.
Below is a comparison of counterfactual forecasting with exogenous regressors using linear versus flat growth.


```python
Expand All @@ -87,7 +87,10 @@ target = "location_41"
cutoff = pd.to_datetime("2023-04-17 00:00:00")

df = (
pd.read_csv("../examples/example_pedestrians_multivariate.csv", parse_dates=["ds"])
pd.read_csv(
"https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_multivariate.csv",
parse_dates=["ds"]
)
.rename(columns={target: "y"})
)
train = df.loc[df["ds"] < cutoff]
Expand Down Expand Up @@ -153,15 +156,15 @@ ax.legend();
![png](/prophet/static/additional_topics_files/additional_topics_18_0.png)


In this example, trends in the target sensor location can be mostly explained by the exogenous regressor (a nearby sensor). The model with linear growth assumes a growing trend and this leads to larger and larger over-predictions in the test period, while the model with the flat trend is mostly driven by trends in the exogenous regressor, which results in a sizeable MAPE improvement.
In this example, the target sensor values can be mostly explained by the exogenous regressor (a nearby sensor). The model with linear growth assumes an increasing trend and this leads to larger and larger over-predictions in the test period, while the flat growth model mostly follows movements in the exogenous regressor and this results in a sizeable MAPE improvement.



Note that forecasting with exogenous regressors is only effective when we can be confident in the future values of the regressor -- the example above is most relevant to time series causal inference, where we want to forecast what a time series would have looked like in the past, and hence the exogenous regressor values are known.
Note that forecasting with exogenous regressors is only effective when we can be confident in the future values of the regressor. The example above is relevant to causal inference using time series, where we want to understand what `y` would have looked like for a past time period, and hence the exogenous regressor values are known.



If the flat trend is used on a time series that doesn't have a constant trend, without exogenous regressors, any trend will be fit with the noise term and so there will be high predictive uncertainty in the forecast.
In other cases -- where we don't have exogenous regressors or have to predict their future values -- if flat growth is used on a time series that doesn't have a constant trend, any trend will be fit with the noise term and so there will be high predictive uncertainty in the forecast.


<a id="custom-trends"> </a>
Expand Down Expand Up @@ -212,7 +215,7 @@ def warm_start_params(m):
res[pname] = np.mean(m.params[pname], axis=0)
return res

df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')
df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')
df1 = df.loc[df['ds'] < '2016-01-19', :] # All data except the last day
m1 = Prophet().fit(df1) # A model fit to all data except the last day

Expand Down Expand Up @@ -242,38 +245,27 @@ Before model fitting, Prophet scales `y` by dividing by the maximum value in the

```python
# Python
large_y = pd.read_csv("https://raw.githubusercontent.com/facebook/prophet/main/python/prophet/tests/data3.csv", parse_dates=["ds"])
large_y = pd.read_csv(
"https://raw.githubusercontent.com/facebook/prophet/main/python/prophet/tests/data3.csv",
parse_dates=["ds"]
)
```
```python
# Python
m1 = Prophet(scaling="absmax")
m1.fit(large_y)
m1 = m1.fit(large_y)
```
01:42:10 - cmdstanpy - INFO - Chain [1] start processing
01:42:10 - cmdstanpy - INFO - Chain [1] done processing





<prophet.forecaster.Prophet at 0x7f253353bee0>

08:11:23 - cmdstanpy - INFO - Chain [1] start processing
08:11:23 - cmdstanpy - INFO - Chain [1] done processing


```python
# Python
m2 = Prophet(scaling="minmax")
m2.fit(large_y)
m2 = m2.fit(large_y)
```
01:42:19 - cmdstanpy - INFO - Chain [1] start processing
01:42:19 - cmdstanpy - INFO - Chain [1] done processing





<prophet.forecaster.Prophet at 0x7f253352d910>

08:11:29 - cmdstanpy - INFO - Chain [1] start processing
08:11:29 - cmdstanpy - INFO - Chain [1] done processing


```python
Expand All @@ -298,12 +290,12 @@ m2.plot(m2.predict(large_y));



For debugging, it's useful to understand how the raw data has been transformed before being passed to the underlying generalized additive model. We can call the `.preprocess()` method to see the data that will be passed to the stan fit routine, and `.calculate_init_params()` to see how the parameters are initialized in the fit routine.
For debugging, it's useful to understand how the raw data has been transformed before being passed to the stan fit routine. We can call the `.preprocess()` method to see all the inputs to stan, and `.calculate_init_params()` to see how the model parameters will be initialized.


```python
# Python
df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')
df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')
m = Prophet()
transformed = m.preprocess(df)
```
Expand Down
78 changes: 32 additions & 46 deletions notebooks/additional_topics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -187,12 +187,12 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is a comparison of counterfactual forecasting under two methods: using linear growth for a single time series, versus flat growth with an exogenous regressor."
"Below is a comparison of counterfactual forecasting with exogenous regressors using linear versus flat growth."
]
},
{
"cell_type": "code",
"execution_count": 104,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -208,7 +208,10 @@
"cutoff = pd.to_datetime(\"2023-04-17 00:00:00\")\n",
"\n",
"df = (\n",
" pd.read_csv(\"../examples/example_pedestrians_multivariate.csv\", parse_dates=[\"ds\"])\n",
" pd.read_csv(\n",
" \"https://raw.githubusercontent.com/facebook/prophet/main/examples/example_pedestrians_multivariate.csv\", \n",
" parse_dates=[\"ds\"]\n",
" )\n",
" .rename(columns={target: \"y\"})\n",
")\n",
"train = df.loc[df[\"ds\"] < cutoff]\n",
Expand Down Expand Up @@ -340,11 +343,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example, trends in the target sensor location can be mostly explained by the exogenous regressor (a nearby sensor). The model with linear growth assumes a growing trend and this leads to larger and larger over-predictions in the test period, while the model with the flat trend is mostly driven by trends in the exogenous regressor, which results in a sizeable MAPE improvement.\n",
"In this example, the target sensor values can be mostly explained by the exogenous regressor (a nearby sensor). The model with linear growth assumes an increasing trend and this leads to larger and larger over-predictions in the test period, while the flat growth model mostly follows movements in the exogenous regressor and this results in a sizeable MAPE improvement.\n",
"\n",
"Note that forecasting with exogenous regressors is only effective when we can be confident in the future values of the regressor -- the example above is most relevant to time series causal inference, where we want to forecast what a time series would have looked like in the past, and hence the exogenous regressor values are known.\n",
"Note that forecasting with exogenous regressors is only effective when we can be confident in the future values of the regressor. The example above is relevant to causal inference using time series, where we want to understand what `y` would have looked like for a past time period, and hence the exogenous regressor values are known.\n",
"\n",
"If the flat trend is used on a time series that doesn't have a constant trend, without exogenous regressors, any trend will be fit with the noise term and so there will be high predictive uncertainty in the forecast."
"In other cases -- where we don't have exogenous regressors or have to predict their future values -- if flat growth is used on a time series that doesn't have a constant trend, any trend will be fit with the noise term and so there will be high predictive uncertainty in the forecast."
]
},
{
Expand Down Expand Up @@ -410,7 +413,7 @@
" res[pname] = np.mean(m.params[pname], axis=0)\n",
" return res\n",
"\n",
"df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')\n",
"df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')\n",
"df1 = df.loc[df['ds'] < '2016-01-19', :] # All data except the last day\n",
"m1 = Prophet().fit(df1) # A model fit to all data except the last day\n",
"\n",
Expand Down Expand Up @@ -439,69 +442,52 @@
},
{
"cell_type": "code",
"execution_count": 114,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"large_y = pd.read_csv(\"https://raw.githubusercontent.com/facebook/prophet/main/python/prophet/tests/data3.csv\", parse_dates=[\"ds\"])"
"large_y = pd.read_csv(\n",
" \"https://raw.githubusercontent.com/facebook/prophet/main/python/prophet/tests/data3.csv\", \n",
" parse_dates=[\"ds\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 116,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"01:42:10 - cmdstanpy - INFO - Chain [1] start processing\n",
"01:42:10 - cmdstanpy - INFO - Chain [1] done processing\n"
"08:11:23 - cmdstanpy - INFO - Chain [1] start processing\n",
"08:11:23 - cmdstanpy - INFO - Chain [1] done processing\n"
]
},
{
"data": {
"text/plain": [
"<prophet.forecaster.Prophet at 0x7f253353bee0>"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m1 = Prophet(scaling=\"absmax\")\n",
"m1.fit(large_y)"
"m1 = m1.fit(large_y)"
]
},
{
"cell_type": "code",
"execution_count": 117,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"01:42:19 - cmdstanpy - INFO - Chain [1] start processing\n",
"01:42:19 - cmdstanpy - INFO - Chain [1] done processing\n"
"08:11:29 - cmdstanpy - INFO - Chain [1] start processing\n",
"08:11:29 - cmdstanpy - INFO - Chain [1] done processing\n"
]
},
{
"data": {
"text/plain": [
"<prophet.forecaster.Prophet at 0x7f253352d910>"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"m2 = Prophet(scaling=\"minmax\")\n",
"m2.fit(large_y)"
"m2 = m2.fit(large_y)"
]
},
{
Expand Down Expand Up @@ -550,23 +536,23 @@
"source": [
"### Inspecting transformed data (new in 1.1.5)\n",
"\n",
"For debugging, it's useful to understand how the raw data has been transformed before being passed to the underlying generalized additive model. We can call the `.preprocess()` method to see the data that will be passed to the stan fit routine, and `.calculate_init_params()` to see how the parameters are initialized in the fit routine."
"For debugging, it's useful to understand how the raw data has been transformed before being passed to the stan fit routine. We can call the `.preprocess()` method to see all the inputs to stan, and `.calculate_init_params()` to see how the model parameters will be initialized."
]
},
{
"cell_type": "code",
"execution_count": 125,
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('../examples/example_wp_log_peyton_manning.csv')\n",
"df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')\n",
"m = Prophet()\n",
"transformed = m.preprocess(df)"
]
},
{
"cell_type": "code",
"execution_count": 128,
"execution_count": 14,
"metadata": {},
"outputs": [
{
Expand All @@ -585,7 +571,7 @@
"Name: y_scaled, dtype: float64"
]
},
"execution_count": 128,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -596,7 +582,7 @@
},
{
"cell_type": "code",
"execution_count": 131,
"execution_count": 15,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -953,7 +939,7 @@
"[10 rows x 26 columns]"
]
},
"execution_count": 131,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -964,7 +950,7 @@
},
{
"cell_type": "code",
"execution_count": 132,
"execution_count": 16,
"metadata": {},
"outputs": [
{
Expand All @@ -975,7 +961,7 @@
" 0., 0., 0., 0., 0., 0., 0., 0., 0.]), sigma_obs=1.0)"
]
},
"execution_count": 132,
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
Expand Down

0 comments on commit b16d849

Please sign in to comment.