Welcome to the comprehensive guide on developing custom firmware for full device emulation using FPGA-based DMA hardware. This guide aims to provide a detailed, step-by-step approach suitable for both beginners and experienced developers. Whether you're a firmware developer, hardware tester, security researcher, or FPGA enthusiast, this guide will equip you with the knowledge and tools necessary to create accurate and effective emulation firmware.
If you need assistance, have inquiries, or are looking to collaborate, feel free to reach out. I'm available to provide guidance, troubleshoot complex problems, or discuss ideas in detail.
- Introduction
- Key Definitions
- Device Compatibility
- Requirements
- Gathering Donor Device Information
- Initial Firmware Customization
- Vivado Project Setup and Customization
- Advanced Firmware Customization
- Emulating Device-Specific Capabilities
- Transaction Layer Packet (TLP) Emulation
- Building, Flashing, and Testing
- Advanced Debugging Techniques
- Troubleshooting
- Emulation Accuracy and Optimizations
- Best Practices for Firmware Development
- Additional Resources
- Appendix A: Shadow Configuration Space
- Appendix B: Writemask Implementation
The primary objective of this guide is to equip developers, security researchers, and hardware engineers with the knowledge and practical steps necessary to develop custom DMA firmware for accurate 1:1 hardware device emulation using FPGA-based systems like PCILeech-FPGA. This enables applications in hardware testing, system debugging, malware analysis, and other scenarios requiring undetectable or legitimate-looking device emulation.
- Firmware Developers: Engineers building custom firmware for hardware emulation, testing, or bypassing hardware restrictions.
- Hardware Testers: Professionals emulating faulty or outdated hardware devices to assess system resilience or compatibility.
- Security Researchers: Individuals utilizing custom firmware for vulnerability testing, malware analysis, or security assessments.
- FPGA Enthusiasts: Hobbyists exploring FPGA customization and low-level hardware emulation.
- Non-Programmers: Individuals with minimal programming experience who are interested in learning about firmware development and device emulation.
Understanding the terminology is crucial for effectively following this guide. Below are key definitions related to PCIe, DMA, and device emulation:
- DMA (Direct Memory Access): A capability allowing hardware devices to directly read from or write to system memory without CPU intervention, facilitating rapid data transfers.
- TLP (Transaction Layer Packet): The fundamental unit of communication in PCIe architecture, encapsulating control and data information.
- BAR (Base Address Register): Registers in PCIe devices that map device memory into system memory space, defining memory and I/O address regions.
- FPGA (Field Programmable Gate Array): A reconfigurable integrated circuit that can be programmed to perform specific hardware functions, enabling custom device emulation.
- MSI/MSI-X (Message Signaled Interrupts): Mechanisms used by PCIe devices to send interrupts to the CPU, handling asynchronous events.
- Device Serial Number (DSN): A unique identifier associated with a specific device, often used for advanced device identification and verification.
- PCIe Configuration Space: A memory area where PCIe devices provide information about themselves and configure operational parameters.
- Donor Card: A PCIe device used to extract configuration and identification details for the purpose of emulating its behavior on an FPGA.
- ACs (Anti-Cheats): Software mechanisms designed to prevent cheating in games and applications by detecting unauthorized hardware or software modifications.
This guide focuses on FPGA-based devices compatible with the PCILeech-FPGA framework. Below is a list of compatible devices:
-
Squirrel (35T)
- Description: Affordable and widely accessible FPGA-based DMA device.
- Use Case: Suitable for standard memory acquisition and device emulation tasks.
-
EnigmaX1 (75T)
- Description: Mid-tier FPGA offering enhanced resources and performance.
- Use Case: Ideal for more demanding memory operations requiring higher bandwidth.
-
ZDMA (100T)
- Description: High-performance FPGA, optimized for rapid memory interactions.
- Use Case: Best suited for scenarios requiring fast and extensive memory reads/writes.
-
Kintex-7
- Description: Advanced FPGA with robust capabilities for complex projects.
- Use Case: Suitable for large-scale or highly customized DMA solutions.
To ensure smooth emulation, several PCIe-specific features must be addressed:
-
IOMMU/VT-d Settings
- Recommendation: Disable IOMMU (Intel's VT-d) to allow unrestricted DMA access.
- Rationale: IOMMU can restrict DMA operations, potentially interfering with memory acquisition and emulation.
-
Kernel DMA Protection
- Recommendation: Disable Kernel DMA Protection features found in modern systems.
- Steps:
- Windows: This may involve disabling Secure Boot or Virtualization-Based Security (VBS).
- BIOS/UEFI: Access firmware settings to turn off related security features.
- Caution: Disabling these features can expose the system to risks; ensure you're operating within a secure and isolated environment.
-
PCIe Slot Requirements
- Recommendation: Use a compatible PCIe slot that matches the FPGA device's requirements (e.g., x1, x4, x16).
- Rationale: Ensures optimal performance and compatibility with the host system.
-
Host System
- Processor: Multi-core CPU (Intel i5/i7 or equivalent)
- Memory: Minimum 16GB RAM
- Storage: SSD with at least 100GB free space
- Operating System: Windows 10/11 (64-bit) or a compatible Linux distribution (e.g., Ubuntu, Debian) with necessary drivers
-
Peripheral Devices
- JTAG Adapter: For flashing firmware onto the FPGA
- PCIe Slot: Ensure the host system has available PCIe slots compatible with the DMA card
-
Donor PCIe Device
- Purpose: Source of device IDs and configuration data for spoofing.
- Examples: Network adapters, storage controllers, or any generic PCIe card not used on the main PC.
-
DMA FPGA Card
- Description: FPGA-based device capable of performing DMA operations.
- Examples: Squirrel (35T), EnigmaX1 (75T), ZDMA (100T), Kintex-7.
-
JTAG Programmer
- Purpose: For flashing firmware onto the FPGA.
- Examples: Xilinx Platform Cable USB, Digilent JTAG USB Cable.
-
Vivado
- Description: Xilinx's FPGA development software for synthesizing and building firmware projects.
- Download: Xilinx Vivado
-
Visual Studio
- Description: Integrated Development Environment (IDE) for editing Verilog or VHDL code.
- Download: Visual Studio Community
-
PCILeech-FPGA
- Description: The repository and base code for DMA firmware development.
- Repository: PCILeech-FPGA on GitHub
-
Arbor
- Description: PCIe device scanning tool for gathering device information.
- Download: Arbor by MindShare
- Note: Requires account creation; offers a 14-day trial.
-
Alternative Tools
- Telescan PE
- Description: PCIe traffic analysis tool that can be used as an alternative to Arbor.
- Download: Teledyne LeCroy Telescan PE
- Note: Free but requires manual registration approval.
- Telescan PE
-
Install Vivado
- Steps:
- Visit the Xilinx Vivado Download Page.
- Download the appropriate version compatible with your FPGA device.
- Follow the installation instructions provided by Xilinx.
- Launch Vivado and ensure it is properly configured.
- Steps:
-
Install Visual Studio
- Steps:
- Visit the Visual Studio Download Page.
- Download and install the Visual Studio Community Edition.
- During installation, ensure you include workloads related to Desktop development with C++ to support hardware description languages (HDLs) like Verilog or VHDL.
- Steps:
-
Clone the PCILeech-FPGA Repository
- Steps:
- Open a terminal or command prompt.
- Clone the repository using Git:
git clone https://github.com/ufrisk/pcileech-fpga.git
- Navigate to the cloned directory:
cd pcileech-fpga
- Steps:
-
Set Up a Clean Development Environment
- Recommendation: Work in an isolated environment to prevent unintended interactions, especially if using the firmware for sensitive tasks like malware analysis.
- Steps:
- Use a dedicated development machine or a virtual environment.
- Ensure no other applications interfere with PCIe operations or FPGA programming.
Accurate device emulation relies on extracting critical information from the donor device. This data allows your FPGA to mimic the target hardware in terms of PCIe configuration and behavior.
Arbor is a powerful tool for scanning PCIe devices and extracting necessary information. Follow these steps to gather donor device details:
-
Install Arbor
- Steps:
- Visit the Arbor Download Page.
- Create an account if required.
- Download and install Arbor on your system.
- Steps:
-
Scan PCIe Devices
- Steps:
- Launch Arbor.
- Navigate to the Local System tab.
- Under Scan Options, ensure default settings are appropriate.
- Click Scan/Rescan to detect all connected PCIe devices.
- Steps:
-
Identify the Donor Device
- Criteria:
- Should not be used on your main PC.
- Examples: PCIe WiFi cards, storage controllers, or generic PCIe devices.
- Steps:
- Locate your donor device in the list of scanned devices.
- Click on the device to view detailed configuration.
- Criteria:
-
Capture Device Data
-
Information to Extract:
- Device ID
- Vendor ID
- Subsystem ID
- Revision ID
- Base Address Registers (BARs)
- Capabilities (e.g., MSI, power management, PCIe link width/speed)
- Device Serial Number (DSN) (if available)
-
Steps:
- Navigate to the PCI Config tab within Arbor.
- Scroll through the Decode section to locate and record the above details.
- Take screenshots or notes of each value for reference during firmware customization.
-
Example Extraction:
-
Note: The DSN may not be present on all devices. If unavailable, proceed with zeros in the DSN field during customization.
-
After scanning, ensure you have accurately recorded the following attributes from the donor device:
- Device ID: A unique identifier for the hardware device.
- Vendor ID: The identifier of the device manufacturer.
- Subsystem ID: Identifies the specific subsystem associated with the device.
- Revision ID: The revision number of the hardware version.
- Base Address Registers (BARs): Defines memory and I/O address regions of the device.
- Capabilities: Such as Power Management (PM), MSI/MSI-X, PCIe link speed, and width.
- Device Serial Number (DSN): If applicable, the unique serial number associated with the device.
Important Considerations:
- BAR Sizes: Ensure the memory-mapped I/O regions match the donor device’s configuration.
- Capabilities: Properly emulate all capabilities to ensure seamless integration with the host system.
- DSN: Enhances the fidelity of emulation; use if available.
With the necessary donor information in hand, proceed to customize the PCIe configuration space and memory mapping within the firmware to spoof the donor device.
-
Navigate to the Configuration File
- Path:
/PCIeSquirrel/src/pcileech_pcie_cfg_a7.sv
- Description: This Verilog file contains the PCIe configuration logic for the device.
- Path:
-
Open the File in Visual Studio
- Steps:
- Launch Visual Studio.
- Open the
pcileech_pcie_cfg_a7.sv
file located in the/PCIeSquirrel/src/
directory.
- Steps:
-
Modify Master Abort Flag
- Steps:
- Use Ctrl + F to search for
rw[20]
. - Change the accompanying
0
to a1
to enable the master abort flag.
rw[20] <= 1'b1; // Enable master abort flag
- Use Ctrl + F to search for
- Steps:
-
Modify Device ID and Vendor ID
- Steps:
- Search for
cfg_deviceid
. - Update the Device ID with the donor's value:
cfg_deviceid <= 16'hXXXX; // Replace XXXX with donor Device ID
- Similarly, search for
cfg_vendorid
and update:cfg_vendorid <= 16'hYYYY; // Replace YYYY with donor Vendor ID
- Search for
- Steps:
-
Modify Subsystem ID
- Steps:
- Search for
cfg_subsysid
. - Update the Subsystem ID:
cfg_subsysid <= 16'hZZZZ; // Replace ZZZZ with donor Subsystem ID
- Search for
- Steps:
-
Adjust BARs Based on Donor Device
- Steps:
- Locate the BAR size configurations.
- Set the BAR sizes to match those of the donor device:
bar0_size <= 32'hXXXX_YYYY; // Replace with donor's BAR0 size
- Repeat for additional BARs (BAR1, BAR2, etc.) as necessary.
- Steps:
If your donor device has a Device Serial Number (DSN), incorporating it into the firmware enhances the emulation's fidelity.
-
Locate the DSN Field
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, search forrw[127:64]
. - This field represents the
cfg_dsn
(Configuration Space Device Serial Number).
- In
- Steps:
-
Insert the DSN
-
Steps:
- Replace the placeholder with your donor device's DSN:
rw[127:64] <= 64'hXXXXXXXX_YYYYYYYY; // Replace Xs and Ys with donor DSN
- Example:
- Donor DSN: Upper DW:
01 00 00 00
, Lower DW:68 4C E0 00
- Combined DSN:
64'h01000000684CE000
rw[127:64] <= 64'h01000000684CE000; // Donor DSN
- Donor DSN: Upper DW:
- No DSN Available:
rw[127:64] <= 64'h0000000000000000; // No DSN
- Replace the placeholder with your donor device's DSN:
-
-
Save Changes
- Steps:
- After modifying the DSN, save the file to retain changes.
- Steps:
After customizing the configuration space, integrate these changes into the Vivado project to prepare the firmware for synthesis and implementation.
-
Open Vivado
- Steps:
- Launch Vivado on your development machine.
- Ensure Vivado is properly installed and configured for your FPGA device.
- Steps:
-
Access the Tcl Console
- Steps:
- In Vivado, locate the Tcl Console at the bottom of the application window.
- If not visible, navigate to Window > Tcl Console to display it.
- Steps:
-
Navigate to the PCIeSquirrel Directory
-
Steps:
- In the Tcl Console, determine your current directory:
pwd
- Change the directory to the
PCIeSquirrel
folder within the clonedpcileech-fpga
repository:Replacecd C:/Users/YourUsername/Desktop/pcileech-fpga/PCIeSquirrel
YourUsername
and the path as per your setup.
- Note: If you encounter errors with backslashes (
\
), use forward slashes (/
):cd C:/Users/YourUsername/Desktop/pcileech-fpga/PCIeSquirrel
- In the Tcl Console, determine your current directory:
-
-
Generate the Vivado Project
- Steps:
- In the Tcl Console, execute the project generation script:
source vivado_generate_project.tcl -notrace
- Wait for the script to complete. This process sets up the Vivado project with the necessary configurations.
- In the Tcl Console, execute the project generation script:
- Steps:
-
Open the Generated Project
- Steps:
- Upon successful generation, Vivado should automatically open the
.xpr
(Vivado Project) file. - Keep the project open for further customization.
- Upon successful generation, Vivado should automatically open the
- Steps:
-
Access the PCIe IP Core
- Steps:
- In the Sources pane, navigate to:
pcileech_squirrel_top > i_pcileech_pcie_a7 : pcileech_pcie_a7
- Double-click on the PCIe IP core (
i_pcie_7x_0 : pcie_7x_0
) to open the Re-customize IP window.
- In the Sources pane, navigate to:
- Steps:
-
Customize Device IDs and BARs
- Steps:
- In the Re-customize IP dialog, navigate to the IDs tab.
- Enter the Device ID, Vendor ID, and Subsystem ID gathered from the donor device.
- Verify the Class Code:
- Go back to Arbor or your scanning tool to determine the class code of your donor device.
- In the Re-customize IP window, set the class code accordingly to match the donor device.
- Example:
- Device ID:
0x1234
- Vendor ID:
0xABCD
- Subsystem ID:
0x5678
- Class Code:
0x020000
(e.g., Network Controller)
- Device ID:
- Steps:
-
Configure BAR Sizes
- Steps:
- Navigate to the BARs tab within the Re-customize IP dialog.
- Set the BAR0 Size to match the donor device's BAR0 size.
- Example: If the donor's BAR0 is 16KB:
BAR0 Size: 16KB
- Example: If the donor's BAR0 is 16KB:
- Repeat for additional BARs (BAR1, BAR2, etc.) if the donor device utilizes them.
- Steps:
-
Finalize IP Customization
- Steps:
- After setting all necessary parameters, click OK to apply the changes.
- Vivado may prompt to regenerate the IP core; confirm and allow the process to complete.
- Steps:
-
Lock the IP Core
- Purpose: Prevent Vivado from overwriting manual configurations during synthesis.
- Steps:
- Open the Tcl Console within Vivado.
- Execute the following command to lock the IP core:
set_property is_managed false [get_files pcie_7x_0.xci]
- To Unlock (if needed in the future):
set_property is_managed true [get_files pcie_7x_0.xci]
To achieve precise 1:1 emulation, further customize PCIe parameters, BARs, memory mapping, power management, and interrupt handling.
-
Match PCIe Link Speed and Width
- Importance: Ensures the emulated device communicates at the same speed and width as the donor device.
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, locate the PCIe link speed and width configurations. - Update these parameters to match the donor device's specifications.
pcie_link_speed <= 4'bXXXX; // Replace XXXX with donor's PCIe link speed pcie_link_width <= 8'b00000YYY; // Replace YYY with donor's PCIe link width
- Example:
- Donor PCIe Link Speed: Gen2 (5 GT/s)
- Donor PCIe Link Width: x4
pcie_link_speed <= 4'b0010; // Gen2 pcie_link_width <= 8'b00000100; // x4
- In
-
Set Capability Pointers
- Purpose: Ensure the PCIe capabilities are correctly linked and recognized by the host system.
- Steps:
- Locate the capability pointer configurations in
pcileech_pcie_cfg_a7.sv
. - Set the capability pointers to match the donor device's configuration.
capability_pointer <= 8'h40; // Example value; replace with donor's capability pointer
- Locate the capability pointer configurations in
-
Modify Extended Capabilities
- Steps:
- Add or adjust extended capabilities such as Advanced Error Reporting (AER), Device Serial Number (DSN), and others as supported by the donor device.
- Ensure that the capability IDs and pointers are correctly set.
- Steps:
Accurate memory mapping is critical for emulating hardware devices. Base Address Registers (BARs) define where the device's memory and registers appear in system memory space.
-
Set BAR Sizes
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, locate the BAR size assignments. - Set the BAR sizes to match those of the donor device.
bar0_size <= 32'h00004000; // 16KB for BAR0 bar1_size <= 32'h00008000; // 32KB for BAR1 (if applicable)
- In
- Steps:
-
Define BAR Address Spaces
- Steps:
- Ensure the BAR address spaces do not overlap and match the donor device's memory layout.
- Use the recorded BAR sizes to set the address ranges appropriately.
bar0_addr <= 32'hF0000000; // Example address; replace with donor's BAR0 address bar1_addr <= 32'hF0004000; // Example address; replace as needed
- Steps:
-
Handle Multiple BARs
- Steps:
- If the donor device uses multiple BARs, repeat the configuration for each BAR.
- Ensure each BAR's size and address align with the donor device's specifications.
- Steps:
Properly emulating power management and interrupt handling ensures the host system interacts seamlessly with the emulated device.
-
Power Management (PM) Configuration
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, locate the Power Management capability settings. - Set the PM capabilities to match the donor device.
PM_CAP_VERSION <= 4'b0011; // Example version; replace with donor's PM version PM_CAP_D1SUPPORT <= 1'b1; // Enable D1 support if the donor does PM_CAP_AUXCURRENT <= 3'b100; // Example value; adjust as per donor PM_CSR_NOSOFTRST <= 1'b0; // Example value; adjust as needed
- In
- Steps:
-
MSI/MSI-X (Interrupts) Configuration
- Steps:
- Locate MSI/MSI-X configuration in
pcileech_pcie_cfg_a7.sv
. - Enable and configure MSI/MSI-X to handle interrupts correctly.
MSI_CAP_64_BIT_ADDR_CAPABLE <= 1'b1; // Enable 64-bit MSI if supported cfg_interrupt <= 1'b1; // Enable MSI interrupts
- Locate MSI/MSI-X configuration in
- Steps:
-
Implementing Interrupt Handling Logic
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, ensure the interrupt signals are correctly routed.assign cfg_interrupt_di = cfg_int_di; assign cfg_interrupt_assert = cfg_int_assert;
- Test interrupt functionality to ensure the host system correctly receives and handles interrupts from the emulated device.
- In
- Steps:
To achieve a true 1:1 emulation, it's essential to replicate the unique capabilities of the donor device beyond basic PCIe interactions.
Most PCIe devices support advanced features like Advanced Error Reporting (AER), Link Speed Negotiation, and Extended Capabilities. Emulating these ensures the host system perceives the emulated device as identical to the donor.
-
Advanced Error Reporting (AER)
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, locate AER configurations. - Enable AER if supported by the donor device.
AER_CAP_VERSION <= 4'b0001; // Example version; replace with donor's AER version AER_CAP_NEXTPTR <= 8'h00; // Set next pointer appropriately
- Implement error handling logic to manage AER-related events.
- In
- Steps:
-
Link Speed Negotiation
- Steps:
- Ensure the PCIe link speed and width negotiation matches the donor device.
- Adjust link speed settings as previously outlined in 8.1.
- Steps:
-
Extended Capabilities
- Steps:
- Identify any extended capabilities used by the donor device (e.g., Vendor-Specific Extended Capabilities, Latency Tolerance Reporting).
- Implement these capabilities within
pcileech_pcie_cfg_a7.sv
by defining the appropriate registers and logic.// Example for Vendor-Specific Extended Capability VSEC_CAP_ID <= 16'h1234; // Replace with vendor-specific ID VSEC_CAP_VERSION <= 4'h1; // Replace with version VSEC_CAP_NEXTPTR <= 12'h000; // Next capability pointer
- Steps:
Some devices incorporate proprietary or vendor-specific features that must be accurately emulated to ensure seamless integration.
-
Identify Vendor-Specific Features
- Steps:
- Use PCIe traffic analysis tools (e.g., Wireshark with PCIe extensions, Teledyne LeCroy) to monitor vendor-specific TLPs.
- Document unique registers, commands, or behaviors exhibited by the donor device.
- Steps:
-
Implementing Vendor-Specific Logic
- Steps:
- In
pcileech_pcie_cfg_a7.sv
, add logic to handle vendor-specific features.// Example: Vendor-Specific Register reg [31:0] vendor_specific_reg; always @(posedge clk) begin if (vendor_specific_write_enable) begin vendor_specific_reg <= vendor_specific_data_in; end end
- Ensure that any proprietary commands or responses are accurately replicated.
- In
- Steps:
-
Testing Vendor-Specific Features
- Steps:
- Use vendor-specific drivers or applications to interact with the emulated device.
- Verify that all proprietary features function as expected.
- Steps:
Accurate emulation of Transaction Layer Packets (TLPs) is vital for ensuring the FPGA-based device communicates seamlessly with the host system, mimicking the behavior of the donor device.
TLPs are the fundamental units of PCIe communication, handling memory reads/writes, configuration accesses, and interrupt signaling.
-
Capture TLPs from the Donor Device
- Steps:
- Use PCIe analysis tools like Teledyne LeCroy’s Telescan PE or Wireshark with PCIe support to monitor TLPs generated by the donor device.
- Record the structure, types, and patterns of TLPs used by the donor device during typical operations.
- Steps:
-
Analyze TLP Structure
-
Components of a TLP:
- Header Fields: Define the type, format, address, and other control information.
- Data Payload: The actual data being transferred.
- Tail Fields: Additional information such as byte counts and sequence numbers.
-
Example TLP Structure:
tlp_header <= {fmt, type, traffic_class, td, ep, attr, length}; tlp_address <= address; tlp_data <= data_payload;
-
-
Emulating Legitimate Traffic
- Steps:
- Ensure that TLPs generated by the FPGA match those captured from the donor device in terms of type, address, length, and data.
- Implement logic to handle different types of TLPs, such as memory writes, memory reads, and configuration accesses.
- Steps:
To accurately mimic the donor device, you must craft custom TLPs that replicate its behavior during various operations.
-
Memory Write TLP Example
- Description: Represents a write operation to system memory.
- Verilog Example:
// TLP Header for Memory Write tlp_header <= {2'b10, 5'b00000, 3'b000, 1'b0, 1'b0, 2'b00, 10'b0000000001}; // Fmt, Type, etc. // Address and Data tlp_address <= 64'h0000_0000_1234_5678; // Target address tlp_data <= 32'hDEADBEEF; // Data to write
-
Memory Read TLP Example
- Description: Represents a read operation from system memory.
- Verilog Example:
// TLP Header for Memory Read tlp_header <= {2'b00, 5'b00000, 3'b000, 1'b0, 1'b0, 2'b00, 10'b0000000001}; // Fmt, Type, etc. // Address tlp_address <= 64'h0000_0000_1234_5678; // Target address
-
Configuration Access TLP Example
- Description: Represents a configuration space access.
- Verilog Example:
// TLP Header for Configuration Write tlp_header <= {2'b10, 5'b00101, 3'b000, 1'b0, 1'b0, 2'b00, 10'b0000000001}; // Fmt, Type, etc. // Address and Data tlp_address <= 32'h00000010; // Configuration register address tlp_data <= 32'h0000_0001; // Data to write
-
Interrupt Signaling TLP Example
- Description: Represents an interrupt signaling to the CPU.
- Verilog Example:
// MSI Interrupt TLP // Format and Type for MSI: 2'b10 (3DW Data), 5'b10100 (Message Signaled Interrupt) tlp_header <= {2'b10, 5'b10100, 3'b000, 1'b0, 1'b0, 2'b00, 10'b0000000001}; // MSI Address and Data tlp_address <= 64'hFFFF_FFFF_FFFF_FFFF; // MSI address (as per specification) tlp_data <= 32'h0000_0001; // MSI data payload
-
Implement TLP Handlers
- Steps:
- In your firmware, implement handlers for different TLP types to ensure correct processing and response.
- Use state machines or logic blocks to manage TLP generation, processing, and response handling.
- Steps:
-
Testing TLPs
- Steps:
- Use simulation tools or test benches to verify the correctness of the TLPs.
- Capture and analyze TLPs during operation to ensure they match expected patterns.
- Steps:
After customizing the firmware and ensuring all configurations align with the donor device, proceed to build, flash, and test the firmware on your FPGA device.
-
Run Synthesis
- Steps:
- In Vivado, click on Run Synthesis.
- Monitor the synthesis process for any warnings or errors.
- Address any critical issues before proceeding.
- Steps:
-
Run Implementation
- Steps:
- After successful synthesis, initiate Run Implementation.
- Ensure that the implementation phase completes without critical warnings.
- Review the implementation report for any potential issues.
- Steps:
-
Generate Bitstream
-
Steps:
- Once implementation is complete, click on Generate Bitstream.
- Confirm any prompts to generate the bitstream.
- Wait for the bitstream generation to finish successfully.
-
Alternative via Tcl Console:
source vivado_build.tcl -notrace
-
Note: The generated bitstream file (
.bit
or.bin
) is typically located in theimpl_1
directory within your project folder.
-
-
Connect FPGA via JTAG
- Steps:
- Ensure your FPGA device is connected to the host system via the JTAG interface.
- Power on the FPGA device.
- Steps:
-
Open Vivado Hardware Manager
- Steps:
- In Vivado, navigate to Open Hardware Manager.
- Click Open Target > Auto Connect to detect the connected FPGA device.
- Steps:
-
Program the FPGA
- Steps:
- In the Hardware Manager, right-click on the detected device and select Program Device.
- Browse to the generated bitstream file (
pcileech_squirrel_top.bit
or similar). - Click Program to flash the firmware onto the FPGA.
- Confirm successful programming via the Hardware Manager console.
- Steps:
-
Alternative Flashing Methods
- For Specific Devices: Follow the manufacturer's instructions or use provided utilities for devices like Squirrel or EnigmaX1.
- Command-Line Tools: Use tools like xc3sprog or OpenOCD for programming via command line.
-
Verify Device Detection
-
Steps:
- Use Device Manager (on Windows) or lspci (on Linux) to verify that the FPGA is detected as the donor device.
- Confirm that the Device ID, Vendor ID, Subsystem ID, and BARs match the donor device's specifications.
-
Example (Linux):
lspci -vvv -s <PCI address>
-
-
Memory Mapping Test
- Steps:
- Access the device's BARs to ensure correct memory mapping.
- Use memory access tools or simple read/write operations to test responsiveness.
- Steps:
-
Interrupts Test
- Steps:
- Trigger interrupts through the emulated device.
- Verify that the host system correctly receives and handles these interrupts.
- Use system logs or diagnostic tools to confirm interrupt handling.
- Steps:
-
Performance Testing
-
Steps:
- Run DMA speed test tools to measure data transfer rates.
- Compare performance metrics against expected values to ensure firmware stability and efficiency.
-
Example Tools:
- PCILeech DMA Speed Test: Available within the PCILeech toolset.
- Custom Benchmark Scripts: Scripts that perform read/write operations to measure performance.
-
-
Configuration Space Validation
-
Steps:
- Use diagnostic tools to inspect the PCIe configuration space.
- Ensure all fields (Device ID, Vendor ID, BARs, Capabilities) are correctly set and match the donor device.
-
Example (Windows):
- Use Arbor or Telescan PE to read the configuration space and compare it to the donor device.
-
When developing custom firmware, encountering issues is common. Advanced debugging techniques can help identify and resolve these problems effectively.
Vivado's Integrated Logic Analyzer (ILA) allows real-time monitoring of internal FPGA signals, aiding in debugging and verification.
-
Set Up ILA Probes
-
Steps:
- In Vivado, navigate to Tools > Insert Logic Analyzer.
- Select signals of interest, such as TLP data paths or state machine outputs.
- Configure the ILA core settings, including trigger conditions and data depth.
-
Example:
ila_0 : ila generic map ( C_PROBE_WIDTH => 128 -- Width of the data probe ) port map ( clk => clk, -- Clock signal probe0 => tlp_data -- Signal to monitor );
-
-
Configure Triggers
- Steps:
- Open the ILA configuration dialog.
- Set trigger conditions based on specific events, such as TLP generation or memory access.
- Adjust trigger levels and qualifiers to capture relevant data.
- Steps:
-
Analyze Signal Waveforms
-
Steps:
- Run the FPGA with the ILA probes enabled.
- Use Vivado’s Waveform Viewer to examine captured signal waveforms.
- Identify timing issues, incorrect logic states, or unexpected behaviors.
-
Benefits:
- Real-time visibility into internal signals.
- Ability to capture and analyze transient issues during TLP processing.
-
Beyond Vivado's ILA, external PCIe traffic analysis tools provide in-depth insights into PCIe communications between the FPGA and the host system.
-
Wireshark with PCIe Extensions
- Description: Wireshark can capture and analyze PCIe traffic with the appropriate extensions or plugins.
- Steps:
- Install Wireshark with PCIe support.
- Configure Wireshark to capture PCIe traffic.
- Analyze captured TLPs to ensure they align with expected donor device behavior.
-
Teledyne LeCroy Telescan PE
- Description: A professional-grade PCIe traffic analysis tool offering comprehensive PCIe traffic monitoring and analysis capabilities.
- Steps:
- Install Teledyne LeCroy’s Telescan PE.
- Connect it to your system to monitor PCIe traffic.
- Use it to capture and dissect TLPs exchanged between the FPGA and host system.
-
Total Phase Beagle
- Description: A PCIe traffic analyzer that allows for real-time capture and analysis of PCIe communications.
- Steps:
- Set up the Total Phase Beagle PCIe analyzer with your system.
- Configure it to monitor and capture PCIe traffic.
- Use its analysis features to verify TLP integrity and behavior.
Benefits of Using PCIe Traffic Analysis Tools:
- Comprehensive TLP Analysis: Detailed inspection of TLPs to ensure accurate emulation.
- Error Detection: Identify malformed TLPs or unexpected transaction patterns.
- Performance Metrics: Measure data transfer rates and identify bottlenecks.
Encountering issues during firmware development is common. This section provides solutions to common problems you may face during the emulation process.
Problem: The host system fails to detect the FPGA as the donor device.
Solutions:
-
Verify Device IDs
- Steps:
- Double-check that the Device ID, Vendor ID, and Subsystem ID in the firmware match those of the donor device.
- Ensure there are no typos or incorrect values in the configuration space.
- Steps:
-
Check PCIe Link Training
- Steps:
- Use PCIe diagnostic tools to verify that the PCIe link is properly trained.
- Ensure that the link speed and width configurations match the donor device.
- Steps:
-
Ensure Correct BAR Configuration
- Steps:
- Confirm that the BAR sizes and address ranges are accurately set.
- Ensure no overlapping or conflicting BAR configurations.
- Steps:
-
Power and Connection Check
- Steps:
- Ensure the FPGA device is properly connected and powered.
- Re-seat the PCIe card to ensure a secure connection.
- Steps:
Problem: Incorrect memory mapping leads to failed or inaccurate memory access.
Solutions:
-
Double-Check BAR Sizes and Addresses
- Steps:
- Verify that each BAR size in the firmware matches the donor device's configuration.
- Ensure that BAR address spaces are correctly set and do not overlap.
- Steps:
-
Use Diagnostic Tools
- Steps:
- Utilize tools like lspci or Arbor to inspect the PCIe configuration space.
- Confirm that the BARs are correctly mapped and accessible.
- Steps:
-
Adjust Memory Regions
- Steps:
- If memory regions are not accessible, adjust the BAR configurations to better match the system's memory map.
- Ensure that the firmware logic correctly handles memory read/write operations.
- Steps:
Problem: Slow DMA performance or errors related to Transaction Layer Packets (TLPs).
Solutions:
-
Optimize TLP Generation
- Steps:
- Ensure that TLPs are correctly formatted and free of errors.
- Use Vivado’s ILA and PCIe traffic analysis tools to identify and rectify malformed TLPs.
- Steps:
-
Adjust Payload Sizes
- Steps:
- Set the maximum read request and payload sizes to 4KB or the highest supported by the donor device.
max_read_request_size <= 3'b101; // 4KB max_payload_size <= 3'b101; // 4KB
- Avoid setting payload sizes beyond what the donor device supports to prevent system instability.
- Set the maximum read request and payload sizes to 4KB or the highest supported by the donor device.
- Steps:
-
Check PCIe Link Settings
- Steps:
- Verify that the PCIe link speed and width are correctly configured.
- Ensure that the FPGA is negotiating the link parameters accurately with the host system.
- Steps:
-
Firmware Integrity
- Steps:
- Review and validate all recent changes to the firmware to ensure no unintended modifications were introduced.
- Revert to a known stable firmware version if performance issues persist.
- Steps:
Ensuring the emulation's accuracy is critical for seamless integration and undetectable behavior. This section outlines techniques to enhance emulation precision and optimize performance.
Matching the donor device's timing characteristics ensures that the host system interacts with the emulated device as if it were the original hardware.
-
Use Matching Clock Domains
- Steps:
- Ensure that the FPGA’s clock matches the PCIe link’s clock rate.
- Synchronize internal clocks within the FPGA to align with PCIe timing requirements.
- Steps:
-
Control Response Latency
- Steps:
- Implement registers or counters to manage response times for TLP acknowledgments and interrupt handling.
- Ensure that the latency in responses matches the donor device’s typical response times.
- Steps:
-
Implement Pipeline Stages
- Steps:
- Use pipelining in the FPGA design to align with the donor device’s data processing stages.
- This reduces latency and ensures timely TLP generation and processing.
- Steps:
Emulating dynamic device behavior based on system interactions ensures the FPGA device responds appropriately under various conditions.
-
Implement State Machines
- Steps:
- Design state machines within the FPGA to manage different operational states of the emulated device.
- Ensure transitions between states mimic the donor device’s behavior based on system calls and interactions.
- Steps:
-
Track and Respond to System Requests
- Steps:
- Monitor incoming system requests and adjust the device’s responses dynamically.
- Ensure that the FPGA firmware can handle varying workloads and respond accurately to different types of TLPs.
- Steps:
-
Handle Asynchronous Events
- Steps:
- Implement logic to manage asynchronous events such as interrupts or error conditions.
- Ensure that the firmware can generate and respond to these events in a manner consistent with the donor device.
- Steps:
Adhering to best practices ensures the development process is efficient, maintainable, and secure.
-
Test Frequently
- Steps:
- Conduct regular tests after each modification to ensure the firmware behaves as expected.
- Use automated scripts or test benches to validate firmware functionality continuously.
- Steps:
-
Document Changes
- Steps:
- Maintain detailed documentation for each change made to the firmware.
- Include explanations for why changes were made and their impact on the overall design.
- Steps:
-
Use Version Control
- Steps:
- Implement a version control system (e.g., Git) to manage different iterations of the firmware.
- Commit changes regularly with descriptive messages to track the evolution of the project.
- Steps:
-
Branching Strategy
- Steps:
- Use branches to manage feature development, bug fixes, and experimental changes.
- Merge stable branches into the main branch only after thorough testing.
- Steps:
-
Prevent Unintended Access
- Steps:
- Ensure that the firmware does not expose system memory or hardware to unauthorized access.
- Implement access controls and validation checks within the firmware.
- Steps:
-
Protect Firmware Integrity
- Steps:
- Avoid introducing vulnerabilities or backdoors during firmware development.
- Conduct regular security reviews and code audits to maintain firmware integrity.
- Steps:
-
Handle Sensitive Data Securely
- Steps:
- If the firmware interacts with sensitive data, implement encryption and secure data handling practices.
- Ensure that sensitive information is not exposed through firmware interfaces or logs.
- Steps:
To further enhance your understanding and capabilities in developing custom firmware for device emulation, the following resources are invaluable:
-
PCILeech-FPGA Repository
-
Vivado FPGA Documentation
-
PCI-SIG Specifications
- Link: PCI-SIG
-
PCIe TLP Primer Tutorial
- Link: PCIe TLP Primer
-
Teledyne LeCroy Telescan PE Documentation
-
Wireshark PCIe Extensions
- Link: Wireshark Extensions
-
Field Programmable Gate Array (FPGA) Basics
- Link: FPGA Basics
-
Arbor Software User Guide
- Link: Arbor User Guide
-
PCIe Specifications and Guides
- Link: PCIe Specifications
The shadow configuration space allows for customization of the firmware without the constraints of Xilinx Vivado's graphical interface. This method involves editing the configuration space directly through a .coe
file.
-
Convert Donor Device's Configuration Space
-
Method:
- Use Telescan PE to save a copy of your donor device's configuration space to a
.tlscan
file. - Use a conversion script to transform the
.tlscan
file into a.coe
file suitable for the firmware. - Script: telescan_to_coe.py
- Use Telescan PE to save a copy of your donor device's configuration space to a
-
Instructions:
- Follow the instructions provided in the script's repository to perform the conversion.
-
-
Modify Firmware Files
- File:
src/pcileech_fifo.sv
- Change:
rw[203] <= 1'b1; // CFGTLP ZERO DATA
- Change to:
rw[203] <= 1'b0;
- Note: If you have previously changed
rw[20]
and potentiallyrw[21]
inpcileech_pcie_cfg_a7.sv
, revert those changes back to0
.
- File:
-
Update
pcileech_cfgspace.coe
- Steps:
- Replace the contents of
pcileech_cfgspace.coe
with the converted configuration data from your donor device. - Adjust BAR configurations:
- Convert BAR addresses to sizing in the
.coe
file. - Use Vivado to verify what your BAR should look like with the specific size.
- Convert BAR addresses to sizing in the
- Replace the contents of
- Steps:
-
Set Synthesis Options
-
Modify
pcie_7x_0_core_top
-
Steps:
- Navigate to
pcie_7x_0_core_top
in Vivado. - Change both
EXT_CFG_CAP_PTR
andEXT_CFG_XP_CAP_PTR
to8'h01
or another appropriate value. - Note: Setting these variables changes where the shadow configuration space takes over.
- Navigate to
-
Calculation for
EXT_CFG_CAP_PTR
:- Determine the block where you want the shadow configuration space to start.
- Convert the hex value of that block to decimal, divide by 4, then convert back to hex.
-
-
Generate Bitstream
- Steps:
- Proceed to generate the bitstream using Vivado.
- Ensure all changes are saved before starting the synthesis and implementation.
- Steps:
Using the shadow configuration space alone can result in registers being read-only, which may not reflect typical device behavior. Implementing a writemask ensures that registers can be written to as expected.
-
Enable PCIe Write
- File:
pcileech_fifo.sv
- Change:
rw[206] <= 1'b0; // CFGTLP PCIE WRITE ENABLE
- Change to:
rw[206] <= 1'b1;
- File:
-
Generate Writemask File
-
Method:
- Use a script to generate a writemask file based on your
.coe
file. - Script: writemask.it
- Use a script to generate a writemask file based on your
-
Steps:
- Follow the instructions in the script's repository to create the writemask file.
- Replace your existing
ip/pcileech_cfgspace_writemask.coe
with the newly generated file.
-
-
Adjust Payload Sizes (if necessary)
- Note: Issues like "tiny PCIe algorithm" can be resolved by adjusting the
.cfg_force_mps
parameter inpcie_7x_0_core_top
to matchDEV_CAP_MAX_PAYLOAD_SUPPORTED
.
- Note: Issues like "tiny PCIe algorithm" can be resolved by adjusting the
-
Generate Bitstream
- Steps:
- Proceed to generate the bitstream using Vivado.
- Ensure all changes are saved before starting the synthesis and implementation.
- Steps:
By following this comprehensive guide, you should now have the knowledge and tools necessary to develop custom firmware for full device emulation using FPGA-based DMA hardware. Remember to adhere to best practices, test thoroughly, and consult additional resources as needed.
If you have any questions, need further assistance, or wish to contribute to the development of custom firmware, feel free to join our community on Discord: