The Front End (FE) consists of two different stages: PC Generation (PC_GEN) and Instruction Fetch (IF). These two stages receive the same PC value at the beginning of the stage; as a result, we consider them as one stage. PC_GEN specifies the next virtual PC value that will be fetched. The IF stage receives the virtual PC from the PC_GEN, fetches the next instruction and enqueues the fetched PC/Instruction pair alongside branch prediction metadata into the frontend queue.
The FE of the BlackParrot processor contains the following components:
-
PC_GEN (including the PC_GEN logic and Branch Prediction Logic)
-
Instruction TLB (I-TLB)
-
Instruction Cache (I-Cache)
Note that in the current implementation, the I-TLB is a pass-through module.
pc_gen.v provides the interfaces for the PC_GEN logics and also interfacing other modules in the FE. PC_GEN provides the pc for the I-TLB and I-Cache. PC_GEN also provides the BTB, BHT and RAS indexes for the Back End (BE) (the queue between the FE and the FE, i.e. the frontend queue).
- vaddr_width_p - the width of the virtual addresses
- paddr_width_p - the width of the physical addresses
- eaddr_width_p - the width of the effective addresses
- btb_indx_width_p - the width of the BTB indexes
- bht_indx_width_p - the width of the BHT indexes
- ras_addr_width_p - the width of the RAS addresses
- instr_width_p - the instruction width
- asid_width_p - the width of ASID
- bp_first_pc_p - the first pc when the processor boots up
- pc_gen_icache_o (ready -> valid)
- icache_pc_gen_i (always ready)
- icache_miss_i (icache miss signal)
- pc_gen_itlb_o (valid -> ready)
- pc_gen_fe_o (ready -> valid)
- fe_pc_gen_i (valid -> ready)
Branch Target Buffer (BTB) stores the addresses of the branch targets and the corresponding branch sites. Branch happens from the branch sites to the branch targets. In order to save the logic sizes, the BTB is designed to have limited entries for storing the branch sites, branch target pairs. The implementation uses the bsg_mem_1r1w RAM design.
- bp_fe_pc_gen_btb_idx_width_lp - the number of bits to index the BTB
- eaddr_width_p - the width of the effective address
- btb_idx_w_i - the bit to index the BTB writing (for updating the BTB)
- btb_idx_r_i - the bit to index the BTB reading (for prediction)
- btb_r_i - the bit to enable the BTB reading (for updating the BTB)
- btb_w_i - the bit to enable the BTB writing (for prediction)
- branch_target_i - the input for the BTB entry (for updating the BTB)
- branch_target_o - the output target (for the prediction)
- read_valid_o - the bit that tells whether the BTB entry has been written before reading
Branch History Table (BHT) records a history of branch prediction results, and predict whether next branch should be taken or not. After each prediction, the back-end (BE) informs the front-end (FE) whether the previous prediction is correct or not. The BHT will update the corresponding entry according to the previous results. The two bits in each entry of the BHT follows the rule in the table.
Bit 1 | Bit 0 |
---|---|
taken or not taken | strong or weak |
-
If the entry in the BHT shows that the previous prediction is strong, and the BE informs the FE that the previous prediction is correct, the BHT does not update any of the entry.
-
If the entry in the BHT indicates that the previous prediction is weak, and at the same time, the BE informs the FE that the previous prediction is correct, the BHT will update its entry to make it strong prediction.
-
If the entry in the BHT shows that the previous prediction is strong, but the BE informs the FE that the previous prediction is wrong, the BHT changes the strong prediction to weak prediction in the corresponding entry.
-
If the entry in the BHT indicates that the previous prediction is weak and at the same time, the BE informs the FE that the previous prediction is wrong, the BHT will change the prediction either from taken to not taken, or from not taken to taken, in the corresponding entry.
During the branch prediction, the FE reads the corresponding entry taken or not taken bit (Bit 1) to predict. If the Bit 1 is 1, then the FE take the branch prediction. If the Bit 1 is 0, the FE does not take the branch prediction.
- bht_indx_width_p - the number of bits used to index the BHT
- els_lp - the number of entries in the BHT
- bht_idx_r_i - the index used for reading the BHT (for prediction)
- bht_idx_w_i - the index used for writing the BHT (for updating the BHT)
- bht_r_i - the read enable bit
- bht_w_i - the write enable bit
- correct_i - the bit used to indicate correct/wrong prediction (for updating the BHT entry)
- predict_o - the bit used to predict next branch
The I-Cache (I$) is implemented as a virtually-indexed physically-tagged cache. The I-Cache module consists of two components: cache logic and Local Cache Engine (LCE). The cache logic is a two-staged pipelined cache (consisting of Tag-Lookup (TL) stage and Tag-Verify (TV) stage) and the LCE is the cache entity participating in coherence.
The file bp_fe_icache.v defines the top level I-Cache module. This module is instantiated once per Black Parrot multi-core processor. This module implements the cache logic and instantiates the LCE module.
- eaddr_width_p - effective address width
- data_width_p - data width
- instr_width_p - instruction width
- tag_width_p - tag width
- num_cce_p - number of CCEs in the system
- num_lce_p - number of LCEs in the system
- lce_id_p - ID of this LCE in the system
- lce_assoc_p - Associativity of this LCE
- lce_sets_p - Number of sets in this LCE
- lce_states_p - Number of coherency states for the LCE
- block_size_in_bytes_p - The cache line (block) size in bytes
The I-Cache receives the virtual pc from the PC_GEN and physical pc from I-TLB; then, it responds to PC_GEN by fetching the corresponding instruction. In case of a cache miss, the LCE sends a request to the CCE and waits for a response. Additionally, LCE handles other incoming messages from the CCE.
- pc_gen_icache_vaddr - valid->ready
- icache_pc_gen_data - valid
- itlb_icache_data_resp - valid->ready
- LCE to CCE - ready->valid
- CCE to LCE - ready->valid
- LCE to LCE (inbound) - ready->valid
- LCE to LCE (outbound) - ready->valid
The file bp_fe_lce.v defines the top level LCE module.
- eaddr_width_p - effective address width
- data_width_p - data width
- instr_width_p - instruction width
- tag_width_p - tag width
- num_cce_p - number of CCEs in the system
- num_lce_p - number of LCEs in the system
- lce_id_p - ID of this LCE in the system
- lce_assoc_p - Associativity of this LCE
- lce_sets_p - Number of sets in this LCE
- lce_states_p - Number of coherency states for the LCE
- block_size_in_bytes_p - The cache line (block) size in bytes
In addition to the LCE and CCE messages, the LCE interface consists of the data_mem, tag_mem, and meta_data_mem packets sent to the cache.
- data_mem_pkt - valid->yumi
- tag_mem_pkt - valid->yumi
- meta_data_mem_pkt - valid->yumi
- LCE to CCE - ready->valid
- CCE to LCE - ready->valid
- LCE to LCE (inbound) - ready->valid
- LCE to LCE (outbound) - ready->valid