fix: resolve duplicate port & improve first endpoint detection #273

psyray · 2025-02-16T20:48:29Z

Summary by Sourcery

This pull request addresses several issues related to port scanning, endpoint detection, and data aggregation for the dashboard. It also includes performance improvements, bug fixes, and enhancements to the overall scanning process.

Bug Fixes:

Fixes a bug where duplicate ports were being added to the database.
Fixes an issue where the first endpoint was not being correctly detected during Nmap scans.

Enhancements:

Improves the dashboard's performance by optimizing database queries for retrieving most used ports, IPs, technologies, CVEs, CWEs, and vulnerability tags.
Updates the Nmap integration to process service information more reliably and efficiently.
Refactors the port scanning task to avoid adding duplicate ports and improve endpoint detection.
Improves the HTTP crawl task by handling URLs more robustly and preventing errors when no URLs are provided.
Enhances the subdomain import process to create the first endpoint from Nmap data, improving initial scan results.
Updates the engine task loading to use the engine ID instead of the engine name.
Improves the performance of the dashboard charts by reversing the data before rendering.

Build:

Adds a database migration to migrate the M2M relationship between IPAddress and Port to a ForeignKey relationship.

Tests:

Adds signal handlers to clean up orphaned IP addresses after subdomain deletion.

Chores:

Updates dependencies and configurations for improved performance and security.
Updates the default auto field for the startScan app to BigAutoField.

This change improves the port scanning functionality by preventing duplicate port entries and adds cascading deletes for Subdomains, IP addresses, and Ports. It also includes minor updates to logging, datatable display, and app initialization.

This change improves the performance of deleting orphaned ports and adds a database index to the ports ManyToMany field on the IpAddress model. The orphaned port deletion process is now more efficient by using bulk deletion and fetching related objects in bulk. The database index improves query performance for lookups involving IP addresses and their associated ports. A database migration is included to apply the index change.

Improved port management by changing the relationship between IPs and Ports from a ManyToMany field to a ForeignKey on the Port model, directly linking a port to a specific IP. This simplifies database interactions and eliminates redundant data. The logic for handling port creation, updates, and service information association has been centralized and streamlined. Orphaned port and IP cleanup logic, previously handled by signals, has been removed as it's now implicitly managed by the database through cascading deletes.

This change improves port scanning by including all ports for HTTP crawling and removes redundant logic that excluded ports 80 and 443. It also refines the handling of subdomain deletions to ensure orphaned IP addresses are correctly cleaned up after a subdomain is removed, preventing dangling database entries. Additionally, minor logging improvements and code style adjustments are included.

This change improves the handling of imported subdomains by adding automatic Nmap scanning and endpoint creation for each imported subdomain. It also refactors the subdomain deduplication logic to use set comprehension for improved efficiency and readability.

This change refactors the engine task loading mechanism in the subscan modal. It now retrieves tasks based on the selected engine ID instead of the engine name, improving accuracy and efficiency. Additionally, it enhances error handling and user experience by displaying informative messages when no tasks are available or an error occurs during loading. The change also simplifies the task display logic and removes unnecessary code related to port scanning visibility. Finally, it updates the engine ordering to be case-insensitive.

sourcery-ai · 2025-02-16T20:48:33Z

Reviewer's Guide by Sourcery

This pull request addresses duplicate port handling, improves first endpoint detection, enhances Nmap result processing, and refactors the UI for better engine task loading. It also includes bug fixes for HTTP crawl and adds signals for cleaning up orphaned IP addresses.

Sequence diagram for Port Scanning with get_or_create_port

sequenceDiagram
    participant PS as Port Scan Task
    participant IP as IpAddress
    participant P as Port

    PS->>IP: Get or create IP Address
    IP-->>PS: Returns IP Address
    PS->>P: get_or_create_port(ip, port_number)
    alt Port exists
        P-->>PS: Returns existing Port
    else Port does not exist
        P->>P: Creates new Port with default values
        P-->>PS: Returns new Port
    end
    PS->>P: Updates Port service information (if available)
    P-->>PS: Port updated

Sequence diagram for HTTP Crawl task

sequenceDiagram
    participant HT as HTTP Crawl Task
    participant S as Subdomain
    participant E as Endpoint

    HT->>HT: Checks if URLs are provided directly
    alt URLs are provided
        HT->>HT: Filters URLs based on URL filter
        HT->>HT: Writes URLs to input file
    else URLs are not provided
        HT->>S: Get subdomain endpoints
        S-->>HT: Returns HTTP URLs
        HT->>HT: Extends URLs with subdomain endpoints
    end
    HT->>HT: Executes httpx command
    loop For each httpx result
        HT->>E: Saves/Updates Endpoint
        E-->>HT: Returns Endpoint
    end
    HT->>HT: Removes input file

Sequence diagram for create_first_endpoint_from_nmap_data

sequenceDiagram
    participant SF as save_imported_subdomains
    participant N as Nmap
    participant CF as create_first_endpoint_from_nmap_data
    participant E as Endpoint

    SF->>N: get_nmap_http_datas(subdomain_name, ctx)
    N-->>SF: hosts_data
    SF->>CF: create_first_endpoint_from_nmap_data(hosts_data, domain, subdomain, ctx)
    CF->>E: save_endpoint(http_url, ctx, is_default, subdomain)
    E-->>CF: endpoint
    CF-->>SF: endpoint

Updated class diagram for Port and IpAddress models

classDiagram
    class IpAddress {
        +id: int
        +address: str
        +is_cdn: bool
        +geo_iso: ForeignKey
        +version: int
    }
    class Port {
        +id: int
        +number: int
        +is_uncommon: bool
        +service_name: str
        +description: str
        +ip_address: ForeignKey
    }

    Port --|> IpAddress : ForeignKey
    note for Port "Unique together constraint on (ip_address, number)"

File-Level Changes

Change	Details	Files
Refactor dashboard charts to use template tags for improved readability and maintainability.	Replaced hardcoded date and data lists with template loops using `{% with %}` and `{% for %}` tags. Reversed the order of data in the lists to match the expected chart format. Used `	slice:":7"`to limit the data to the last 7 entries.</li><li>Used`
Fix duplicate port handling and improve port management during scans.	Modified the `port_scan` task to accept `None` as a default value for mutable arguments. Used `get_or_create_port` to either retrieve an existing port or create a new one, ensuring no duplicates. Removed the conditional check for ports 80 and 443 before adding endpoints. Updated Nmap to run on the host instead of common ports. Added a new migration to add a foreign key from the Port model to the IpAddress model, and to add a unique constraint on the ip_address and number fields.	`web/reNgine/tasks.py` `web/startScan/models.py` `web/startScan/migrations/0059_auto_20250216_1450.py`
Improve HTTP crawl task to handle null URLs and prevent errors.	Initialized `urls` as an empty list if `None` is passed to the `http_crawl` task. Added a check to filter out null URLs from the list of URLs to crawl. Added a try-except block to handle potential exceptions when deleting the input file. Added a check to ensure the input file exists before attempting to delete it.	`web/reNgine/tasks.py`
Enhance Nmap result processing and port service information updates.	Modified `parse_nmap_results` to handle multiple IP addresses and extract address types. Updated `process_nmap_service_results` to correctly process Nmap service detection results. Implemented `get_or_create_port` to centralize port handling and service info management. Implemented `update_port_service_info` to update port service information consistently.	`web/reNgine/tasks.py` `web/reNgine/common_func.py`
Improve UI for engine task loading and subdomain management.	Updated the engine task loading in the UI to use the engine ID instead of the engine name. Added signals to clean up orphaned IP addresses when subdomains are deleted or IPs are removed from subdomains.	`web/startScan/static/startScan/js/detail_scan.js` `web/startScan/signals.py`
Enhance API serializers and views for engine and port management.	Updated the `EngineSerializer` to include a `get_tasks` method that retrieves tasks from the engine's YAML configuration. Modified the `ListEngines` API view to filter engines by ID if provided. Updated the port query in the API view to filter based on the new IP address relationship.	`web/api/serializers.py` `web/api/views.py`
Miscellaneous changes.	Updated the default auto field in `startScan/apps.py` to `django.db.models.BigAutoField`. Added ENGINE_NAMES and ENGINE_DISPLAY_NAMES to `reNgine/definitions.py`. Removed unused code and comments. Fixed minor UI issues and typos.	`web/reNgine/definitions.py` `web/startScan/apps.py` `web/startScan/templates/startScan/detail_scan.html` `web/static/plugins/datatable/global.css`

Possibly linked issues

#987: The PR directly addresses the port recognition and duplication bugs described in the issue by improving port scanning logic and handling.
fix(scan): first endpoint detection for IP #251: The PR includes a fix for the first endpoint detection bug mentioned in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

web/reNgine/tasks.py

web/startScan/signals.py

Fixes a bug where None values could be written to the HTTP URLs file.

web/reNgine/tasks.py

web/startScan/signals.py

Data migration to correctly update existing port data to use the new ForeignKey field. Some service_name datas will be lost as old system do not track service_name by ip and by port, but only by port

This commit refactors the dashboard charts to enhance data handling and consistency. Key improvements include: More accurate and efficient database queries for chart data. Standardized data retrieval and processing for charts. Improved handling of optional chart data. Corrected a minor typo in a chart title. Added a check for file existence before deletion in http_crawl task. Corrected a redirect issue in admin_interface_update.

sourcery-ai

Hey @psyray - I've reviewed your changes - here's some feedback:

Overall Comments:

The conversion of the javascript charts to use django's templating language is a good change.
The migration to move the port relationship from a many-to-many to a foreign key looks correct.

Here's what I looked at during the review

🟡 General issues: 5 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

web/reNgine/tasks.py

web/api/serializers.py

web/startScan/migrations/0059_auto_20250216_1450.py

web/startScan/signals.py

web/dashboard/views.py

web/reNgine/tasks.py

AnonymousWP · 2025-02-18T22:16:43Z

I'll review this once suggestions have been implemented and conversations have been resolved.

0b3ud · 2025-02-19T02:28:20Z

@psyray I have tested this PR everything works no more port duplication

This commit introduces significant improvements to the dashboard and backend, focusing on efficient data retrieval and presentation. Key changes include optimized database queries for scan statistics, vulnerability data, and asset information, as well as enhanced timeline visualizations and activity feeds. Additionally, cleanup processes for orphaned data and temporary files have been refined. These enhancements improve performance and provide a more comprehensive overview of project security posture.

Error handling for YAML parsing and subdomain deletion has been enhanced to provide more informative logging and prevent unexpected behavior.

web/startScan/models.py

0b3ud

Re tested everything works code is clean

psyray added 7 commits February 15, 2025 20:06

fix(tests): update tests for API port listing

d9c17b2

github-advanced-security bot found potential problems Feb 16, 2025

View reviewed changes

web/reNgine/tasks.py Fixed Show fixed Hide fixed

web/reNgine/tasks.py Fixed Show fixed Hide fixed

web/startScan/signals.py Fixed Show fixed Hide fixed

psyray added 2 commits February 17, 2025 13:02

fix: resolve sourcery recommendations

28c925c

fix: prevent None in urls

5e458ea

Fixes a bug where None values could be written to the HTTP URLs file.

github-advanced-security bot found potential problems Feb 17, 2025

View reviewed changes

web/reNgine/tasks.py Fixed Show fixed Hide fixed

web/startScan/signals.py Fixed Show fixed Hide fixed

psyray added 3 commits February 18, 2025 12:09

migration: migrate IP to new Port FK field

dc57957

Data migration to correctly update existing port data to use the new ForeignKey field. Some service_name datas will be lost as old system do not track service_name by ip and by port, but only by port

fix: apply CodeQL recommendations

a14c4f1

psyray marked this pull request as ready for review February 18, 2025 19:05

psyray requested a review from 0b3ud February 18, 2025 19:05

psyray self-assigned this Feb 18, 2025

psyray added the bug Something isn't working label Feb 18, 2025

psyray linked an issue Feb 18, 2025 that may be closed by this pull request

bug(portscan): ports are duplicated by Naabu when launching a port scan #268

Closed

3 tasks

sourcery-ai bot reviewed Feb 18, 2025

View reviewed changes

sourcery-ai bot mentioned this pull request Feb 18, 2025

Refactor port_scan Function to Improve Code Quality #274

Open

psyray added 2 commits February 19, 2025 19:49

refactor: Improve file deletion and error handling

51a3bb7

Error handling for YAML parsing and subdomain deletion has been enhanced to provide more informative logging and prevent unexpected behavior.

github-advanced-security bot found potential problems Feb 19, 2025

View reviewed changes

web/startScan/models.py Fixed Show resolved Hide resolved

fix: remove unused import

03dce60

0b3ud approved these changes Feb 19, 2025

View reviewed changes

0b3ud merged commit bbb5a25 into release/2.1.1 Feb 19, 2025
5 checks passed

0b3ud deleted the fix/268-portscan branch February 19, 2025 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve duplicate port & improve first endpoint detection #273

fix: resolve duplicate port & improve first endpoint detection #273

psyray commented Feb 16, 2025 •

edited

Loading

sourcery-ai bot commented Feb 16, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

AnonymousWP commented Feb 18, 2025

0b3ud commented Feb 19, 2025

0b3ud left a comment

fix: resolve duplicate port & improve first endpoint detection #273

fix: resolve duplicate port & improve first endpoint detection #273

Conversation

psyray commented Feb 16, 2025 • edited Loading

Summary by Sourcery

sourcery-ai bot commented Feb 16, 2025 • edited Loading

Reviewer's Guide by Sourcery

Sequence diagram for Port Scanning with get_or_create_port

Sequence diagram for HTTP Crawl task

Sequence diagram for create_first_endpoint_from_nmap_data

Updated class diagram for Port and IpAddress models

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

AnonymousWP commented Feb 18, 2025

0b3ud commented Feb 19, 2025

0b3ud left a comment

Choose a reason for hiding this comment

psyray commented Feb 16, 2025 •

edited

Loading

sourcery-ai bot commented Feb 16, 2025 •

edited

Loading