Skip to content

danialzendehdel/BMS-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Battery Management System (BMS) with Reinforcement Learning

Project Overview

Implement a Battery Management System (BMS) using Reinforcement Learning (RL) to manage energy flow in a Renewable Energy Community (REC) comprising:

  • Photovoltaic (PV) Generator
  • Residential Load
  • Battery Storage System
  • Electricity Price Model

Objectives

  1. Increase Self-Consumption: Maximize the utilization of locally generated PV energy to meet residential load demands.
  2. Generate Profit: Sell surplus energy back to the grid during advantageous periods.
  3. Maintain Battery Health: Ensure the battery operates within safe State of Charge (SoC) limits.
  4. Adhere to Operational Constraints: Enforce physical and operational constraints in the energy management process.

Environment Components

1. Photovoltaic (PV) Generation $P^G_t$

  • Description: Energy generated by the PV system at time $t$.
  • Unit: Kilowatts (kW)
  • Characteristics: Continuous variable.
  • Data Source: Historical data reflecting realistic PV generation patterns.

2. Residential Load Demand $P^L_t$

  • Description: Energy consumption of the residential load at time $t$.
  • Unit: Kilowatts (kW)
  • Characteristics: Continuous variable.
  • Data Source: Historical data reflecting realistic load demand patterns.

3. Battery Storage System

  • State of Charge (SoC):
    • Description: Current energy level in the battery as a percentage of its capacity.
    • Constraints: $$ \text{SoC}_{\text{min}} \leq \text{SoC}t \leq \text{SoC}{\text{max}} $$
      • Typical values: $\text{SoC}{\text{min}} = 10%$, $\text{SoC}{\text{max}} = 95%$
    • Characteristics: Continuous variable.
  • Charging/Discharging Efficiency ($\eta$):
    • Represents the efficiency of the battery when charging or discharging.
    • Typical value: $\eta = 0.9$

4. Time Encoding

  • Description: Represents the current time in a cyclical manner to capture temporal patterns.
  • Encoding Method: Cyclical encoding using sine and cosine functions.
    • Hour of Day: $$ \text{Hour}{\sin} = \sin\left(2\pi \times \frac{\text{Hour}}{24}\right) $$ $$ \text{Hour}{\cos} = \cos\left(2\pi \times \frac{\text{Hour}}{24}\right) $$
    • Day of Week: $$ \text{Day}{\sin} = \sin\left(2\pi \times \frac{\text{Day}}{7}\right) $$ $$ \text{Day}{\cos} = \cos\left(2\pi \times \frac{\text{Day}}{7}\right) $$
  • Characteristics: Continuous variables.

5. Electricity Price

  • Description: The cost of electricity, determined internally based on the current time.
  • Calculation: Price is calculated using the time information according to predefined time phases (as per Italian Law).

Price Phases

  1. Phase 1 (F1):
    • Time: 8 AM – 7 PM, Monday to Friday
    • Price: High ($c_{\text{max}}$)
  2. Phase 2 (F2):
    • Time:
      • 7 AM – 8 AM and 7 PM – 11 PM, Monday to Friday
      • 7 AM – 11 PM, Saturday
    • Price: Medium ($c_{\text{mid}}$)
  3. Phase 3 (F3):
    • Time:
      • 11 PM – 7 AM, Monday to Saturday
      • All day Sunday
    • Price: Low ($c_{\text{min}}$)

States and Actions

State Representation

The state at time $t$ is represented by the vector:

$$ s_t = \left[ \text{SoC}t,\ P^G_t,\ P^L_t,\ \text{Hour}{\sin},\ \text{Hour}{\cos},\ \text{Day}{\sin},\ \text{Day}_{\cos} \right] $$

  • SoC $\text{SoC}_t$: Continuous between $\text{SoC}{\text{min}}$ and $\text{SoC}{\text{max}}$.
  • PV Generation $P^G_t$: Continuous, based on historical data.
  • Load Demand $P^L_t$: Continuous, based on historical data.
  • Time Encoding: Continuous variables representing time cyclically.

Action Space

The agent's action at time $t$ is:

$$ a_t \in \left[ -a_{\text{max}},\ a_{\text{max}} \right] $$

  • Continuous Action Space: The action $a_t$ represents the power to charge or discharge the battery.
    • Charging: $a_t > 0$
    • Discharging: $a_t < 0$
    • Idle: $a_t = 0$
  • Constraints:
    • Charging Rate Limit: $0 \leq a_t \leq a_{\text{charge_max}}$
    • Discharging Rate Limit: $-a_{\text{discharge_max}} \leq a_t \leq 0$
    • Energy Availability:
      • Charging: Limited to surplus PV energy. $$ a_t \leq \max\left(0,\ P^G_t - P^L_t\right) $$
      • Discharging: Limited to net load demand. $$ -a_t \leq \max\left(0,\ P^L_t - P^G_t\right) $$

Environment Dynamics

State Transition

  1. Action Adjustment

    • Adjust Action for Constraints:
      • Clip the agent's proposed action $a_t^\text{proposed}$ to satisfy physical and operational constraints.
      • Adjusted Action: $a_t^\text{adjusted}$
      • Action Adjustment Difference: $$ \Delta a_t = a_t^\text{proposed} - a_t^\text{adjusted} $$
  2. State of Charge Update

    • Proposed SoC Update: $$ \text{SoC}_{t+1}^\text{proposed} = \text{SoC}t + \eta \times \frac{a_t^\text{adjusted} \times \Delta t}{E{\text{cap}}} $$
    • Adjust SoC for Constraints:
      • If $\text{SoC}{t+1}^\text{proposed}$ violates SoC constraints, adjust it: $$ \text{SoC}{t+1}^\text{adjusted} = \text{clip}\left( \text{SoC}{t+1}^\text{proposed},\ \text{SoC}{\text{min}},\ \text{SoC}_{\text{max}} \right) $$
      • SoC Adjustment Difference: $$ \Delta \text{SoC} = \text{SoC}{t+1}^\text{proposed} - \text{SoC}{t+1}^\text{adjusted} $$
  3. Energy Balance Equations

    • Net Load After PV Generation: $$ \text{Net Load} = P^L_t - P^G_t $$

    • Battery Contribution:

      • Actual Action: $$ a_t^\text{actual} = a_t^\text{adjusted} $$
      • Adjust for Energy Availability:
        • Charging: $$ \text{If } a_t^\text{actual} > 0:\ a_t^\text{actual} = \min\left( a_t^\text{actual},\ \max\left(0,\ -\text{Net Load}\right) \right) $$
        • Discharging: $$ \text{If } a_t^\text{actual} < 0:\ a_t^\text{actual} = \max\left( a_t^\text{actual},\ -\max\left(0,\ \text{Net Load}\right) \right) $$
    • Grid Interaction:

      • Energy Purchased: $$ P^{\text{grid}}_t = \max\left(0,\ \text{Net Load} + a_t^\text{actual}\right) $$
      • Energy Sold: $$ P^{\text{surplus}}_t = \max\left(0,\ -\left( \text{Net Load} + a_t^\text{actual} \right) \right) $$
  4. Price Calculation

    • Price Determination: $\text{Price}_t$ is calculated internally based on the current time phase.

Penalties for Constraint Violations

  • Action Penalty: $$ P_{\text{action}} = -\mu \times \left| \Delta a_t \right| $$
  • SoC Penalty: $$ P_{\text{SoC_adjust}} = -\lambda_{\text{SoC}} \times \left| \Delta \text{SoC} \right| $$

Reward Function

The reward at time $t$ aligns with the objectives and penalizes the agent for violating constraints.

Components of the Reward Function

  1. Cost of Energy Purchased from the Grid $C_{\text{purchase}}$:

    $$ C_{\text{purchase}} = c_{\text{buy}} \times P^{\text{grid}}_t $$

  2. Revenue from Energy Sold to the Grid $R_{\text{sale}}$:

    $$ R_{\text{sale}} = c_{\text{sell}} \times P^{\text{surplus}}_t $$

  3. Total Penalty $P_{\text{total}}$:

    $$ P_{\text{total}} = P_{\text{action}} + P_{\text{SoC_adjust}} $$

Total Reward Function

$$ r_t = R_{\text{sale}} - C_{\text{purchase}} + P_{\text{total}} $$

  • Objective: Maximize $r_t$ over time.
  • Note: Penalties are added to the reward (since they are negative), effectively reducing the reward when constraints are violated.

Algorithm and Implementation

Reinforcement Learning Approach

  • Algorithm: Use RL algorithms suitable for continuous action spaces, such as:
    • Deep Deterministic Policy Gradient (DDPG)
    • Soft Actor-Critic (SAC)
    • Proximal Policy Optimization (PPO) with continuous actions

Environment Implementation Details

  • Observation Space: Continuous space represented by:

    $$ \text{Observation} = \begin{bmatrix} \text{SoC}t \ P^G_t \ P^L_t \ \text{Hour}{\sin} \ \text{Hour}{\cos} \ \text{Day}{\sin} \ \text{Day}_{\cos} \end{bmatrix} $$

  • Action Space: Continuous space within the charging and discharging rate limits.

  • Time Interval:

    • Duration: 1 hour per time step.
    • Episode Length: Spans multiple days, depending on data length.
  • Data Integration:

    • Historical Data: Use real historical data for PV generation and load demand to create a realistic environment.
    • Data Handling:
      • Load data into pandas DataFrames.
      • Align and preprocess data (e.g., handle missing values, resample if necessary).
      • At each time step, read the corresponding data point.

Environment Dynamics in Code

  • Action Adjustment in Code:

    action_corrected, penalty_action = self._get_action_check(action, info)
    # Delta action for penalty calculation
    delta_action = action - action_corrected
    penalty_action = -mu * abs(delta_action)
  • SoC Update in Code:

      SoC_proposed = self.SoC + self.eta * (action_corrected * self.time_interval) / self.battery_capacity
      SoC_adjusted = np.clip(SoC_proposed, self.SoC_min, self.SoC_max)
      delta_SoC = SoC_proposed - SoC_adjusted
      penalty_SoC_adjust = -lambda_SoC * abs(delta_SoC)
      self.SoC = SoC_adjusted
  • Energy Balance in Code:

    net_load = self.L - self.G
  • Reward Calculation in Code:

    reward = R_sale - C_purchased + penalty_action + penalty_SoC_adjust

Assumptions and Considerations

  • No Forecasting: The agent only considers current time step data.

  • Battery Charging Constraints:

    • Battery cannot be charged from the grid.
    • Charging is limited to surplus PV energy.
  • Price Levels: Determined internally based on time.

  • Agent Penalization:

    • The agent is penalized for proposing invalid actions and causing SoC violations, even if the environment adjusts these values.
  • Historical Data Usage:

    • Realistic PV generation and load demand patterns improve the agent’s learning and policy effectiveness.

Equations Summary

  1. Action Adjustment Difference: $$ \Delta a_t = a_t^\text{proposed} - a_t^\text{adjusted} $$
  2. SoC Update:
  • Proposed SoC: $$ \text{SoC}_{t+1}^\text{proposed} = \text{SoC}t + \eta \times \frac{a_t^\text{adjusted} \times \Delta t}{E{\text{cap}}} $$
  • SoC Adjustment Difference: $$ \Delta \text{SoC} = \text{SoC}{t+1}^\text{proposed} - \text{SoC}{t+1}^\text{adjusted} $$
  1. Action Penalty: $$ P_{\text{action}} = -\mu \times \left| \Delta a_t \right| $$
  2. SoC Adjustment Penalty: $$ P_{\text{SoC_adjust}} = -\lambda_{\text{SoC}} \times \left| \Delta \text{SoC} \right| $$
  3. Grid Interaction:
  • Energy Purchased: $$ P^{\text{grid}}_t = \max\left(0,\ \text{Net Load} + a_t^\text{actual}\right) $$
  • Energy Sold: $$ P^{\text{surplus}}_t = \max\left(0,\ -\left( \text{Net Load} + a_t^\text{actual} \right) \right) $$
  1. Reward Function: $$ r_t = \left[ c_{\text{sell}} \times P^{\text{surplus}}t \right] - \left[ c{\text{buy}} \times P^{\text{grid}}t \right] + P{\text{action}} + P_{\text{SoC_adjust}} $$

Conclusion

This project aims to develop an RL-based BMS that optimizes energy flow within a REC by maximizing self-consumption and generating profit while maintaining battery health and adhering to operational constraints. By incorporating penalties for action and SoC adjustments, the agent is incentivized to operate within valid constraints, leading to more effective and realistic policy learning.

Additional Information

Data Sources

•	PV Generation and Load Demand:
•	Use datasets that provide detailed energy consumption and PV generation data, such as:
•	Pecan Street Dataport
•	UCI Machine Learning Repository
•	REFIT Electrical Load Measurements

Agent Learning Considerations

•	Penalties and Adjustments:
•	The environment adjusts invalid actions and SoC values to maintain physical realism.
•	Penalties are applied to the agent to encourage learning valid actions.
•	Agent Observations:
•	The agent receives observations based on adjusted state variables.
•	Over time, the agent learns to propose actions within valid constraints to maximize rewards.

Testing and Validation

•	Environment Testing:
•	Before training the agent, thoroughly test the environment with known scenarios.
•	Ensure that energy flows, constraints, and rewards are calculated correctly.
•	Agent Training:
•	Start with a simple algorithm and gradually increase complexity.
•	Monitor the agent’s performance and adjust hyperparameters as needed.

Future Enhancements

•	Forecasting:
•	Incorporate short-term forecasting of PV generation and load demand to improve decision-making.
•	Dynamic Pricing:
•	Implement dynamic electricity pricing models based on market conditions.
•	Scalability:
•	Extend the environment to manage multiple batteries or interact with a larger grid.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages