Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the reading and writing of abacus/stru format #793

Merged
merged 3 commits into from
Feb 14, 2025

Conversation

pxlxingliang
Copy link
Contributor

@pxlxingliang pxlxingliang commented Feb 11, 2025

Refactor the codes to read and write ABACUS/STRU, and move the functions in a single file abaucs/stru.py

Now, now using dpdata.system to read an ABACUS STRU will also return below informations in data dict:

{
           "masses": list of atomic masses,
            "pp_files", list of pseudo potential files,
            "orb_files", list of orbital files,
            "dpks_descriptor": the deepks descriptor file,
}

And, these information can also be written to a new STRU file automatically.

Later, I will based on this commit to fix the bug in dpgen deepmodeling/dpgen#1711

Summary by CodeRabbit

  • New Features

    • Introduced a dedicated module for structure file handling, enhancing the parsing and conversion of lattice, species, and coordinate data.
  • Refactor

    • Streamlined data extraction processes for simulation and relaxation workflows, reducing redundant operations and improving error clarity.
    • Updated plugin methods to leverage the enhanced structure processing functions for improved efficiency.
  • Tests

    • Improved test setups and cleanups, ensuring consistent handling of structure files and robust validation of the new parsing logic.

Copy link

codspeed-hq bot commented Feb 11, 2025

CodSpeed Performance Report

Merging #793 will not alter performance

Comparing pxlxingliang:stru (1b093d1) with devel (0af5e66)

Summary

✅ 2 untouched benchmarks

Copy link

coderabbitai bot commented Feb 11, 2025

📝 Walkthrough

Walkthrough

This pull request refactors the data extraction process for ABACUS calculations by removing several legacy helper functions (such as get_cell and get_coords) in favor of a unified get_frame_from_stru approach. The changes span multiple modules in the dpdata/abacus package (for MD, relax, and SCF calculations), introduce a new module for STRU file parsing with several helper functions, and update plugin methods accordingly. Test cases have also been updated to streamline file setup and cleanup operations.

Changes

File(s) Change Summary
dpdata/abacus/md.py, dpdata/abacus/relax.py Updated get_frame functions to remove get_cell and get_coords in favor of get_frame_from_stru; streamlined handling of atomic numbers, spins, and the move list.
dpdata/abacus/scf.py Refactored get_frame: removed multiple helper functions (e.g., get_block, get_cell, parse_stru_pos, etc.), eliminated the EnergyConversion import, and integrated the new get_frame_from_stru along with direct force and magnetic moment extraction.
dpdata/abacus/stru.py Introduced a new module for STRU file parsing; added functions for block splitting, parsing atomic species, lattice constants/vectors, position lines, Cartesian conversion, and generating unlabeled STRU files.
dpdata/plugins/abacus.py Refactored the AbacusSTRUFormat class to replace calls to the old functions with the new functions (get_frame_from_stru and make_unlabeled_stru) from the updated STRU module.
tests/test_abacus_relax.py Modified test setup and teardown: now copying STRU.h2o to STRU in setUp and removing the STRU file in tearDown, eliminating the backup file step.
tests/test_abacus_stru_dump.py Replaced uses of parse_stru_pos with parse_pos_oneline; added a new test (test_read_stru), renamed an existing test method, and updated error handling tests accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant GetFrame
    participant FrameFromStru
    participant STRUParser

    Caller->>GetFrame: Call get_frame(fname)
    GetFrame->>FrameFromStru: Read and parse STRU file
    FrameFromStru->>STRUParser: Split file blocks and parse sections
    STRUParser-->>FrameFromStru: Return structured data dictionary
    FrameFromStru-->>GetFrame: Return parsed data (atomic numbers, coordinates, etc.)
    GetFrame-->>Caller: Return final frame data with updated moves and cleaned attributes
Loading

Possibly related PRs

Suggested reviewers

  • wanghan-iapcm
✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Feb 11, 2025

Codecov Report

Attention: Patch coverage is 94.30693% with 23 lines in your changes missing coverage. Please review.

Project coverage is 85.33%. Comparing base (5423efe) to head (1b093d1).
Report is 20 commits behind head on devel.

Files with missing lines Patch % Lines
dpdata/abacus/stru.py 93.80% 23 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #793      +/-   ##
==========================================
+ Coverage   85.15%   85.33%   +0.17%     
==========================================
  Files          81       82       +1     
  Lines        7526     7515      -11     
==========================================
+ Hits         6409     6413       +4     
+ Misses       1117     1102      -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
dpdata/abacus/stru.py (1)

487-787: 🛠️ Refactor suggestion

Function make_unlabeled_stru: flexible STRU writer.

  1. Large parameter list. Consider splitting into subroutines or using a builder pattern if expansions continue.
  2. The link_file logic is convenient but watch out for conflicting symlinks.
  3. The final loop generating atom lines is well-done but quite extensive; smaller helper methods could reduce complexity.
🧰 Tools
🪛 Ruff (0.8.2)

623-627: Use a single if statement instead of nested if statements

(SIM102)


641-641: No explicit stacklevel keyword argument found

(B028)

🪛 GitHub Check: codecov/patch

[warning] 549-550: dpdata/abacus/stru.py#L549-L550
Added lines #L549 - L550 were not covered by tests


[warning] 553-553: dpdata/abacus/stru.py#L553
Added line #L553 was not covered by tests


[warning] 556-557: dpdata/abacus/stru.py#L556-L557
Added lines #L556 - L557 were not covered by tests


[warning] 559-559: dpdata/abacus/stru.py#L559
Added line #L559 was not covered by tests


[warning] 584-584: dpdata/abacus/stru.py#L584
Added line #L584 was not covered by tests


[warning] 587-587: dpdata/abacus/stru.py#L587
Added line #L587 was not covered by tests


[warning] 628-629: dpdata/abacus/stru.py#L628-L629
Added lines #L628 - L629 were not covered by tests


[warning] 632-632: dpdata/abacus/stru.py#L632
Added line #L632 was not covered by tests


[warning] 641-641: dpdata/abacus/stru.py#L641
Added line #L641 was not covered by tests


[warning] 644-644: dpdata/abacus/stru.py#L644
Added line #L644 was not covered by tests


[warning] 648-648: dpdata/abacus/stru.py#L648
Added line #L648 was not covered by tests


[warning] 717-717: dpdata/abacus/stru.py#L717
Added line #L717 was not covered by tests


[warning] 736-736: dpdata/abacus/stru.py#L736
Added line #L736 was not covered by tests


[warning] 743-743: dpdata/abacus/stru.py#L743
Added line #L743 was not covered by tests


[warning] 781-781: dpdata/abacus/stru.py#L781
Added line #L781 was not covered by tests

🧹 Nitpick comments (12)
dpdata/abacus/scf.py (4)

205-206: No-op lines callout.
These blank or comment lines have no functional impact. Recommend consolidating if they are not serving any clarity purpose.


213-217: Early return upon non-convergence.
When converge is false, the function returns partially filled data. This is acceptable if the caller handles incomplete data gracefully. Otherwise, consider raising a dedicated exception or logging a warning to quickly identify non-converged runs.


226-226: Clarify the comment.
The comment line provides a heading but does not add more detail. If needed, expand the comment to summarize how these properties (magmom, magforce, forces, stress) are collected.


248-249: Retaining 'move' data.
Storing the popped move back into data is consistent. Ensure external references or previously retrieved copies of data are updated if they assumed move was absent.

dpdata/abacus/md.py (1)

219-221: Expanding 'move' data for multiple frames.
Repeating the single array move[0] across all frames may cause confusion if atomic mobility changes dynamically. Confirm that this static approach is correct for your MD scenario.

dpdata/abacus/stru.py (7)

14-55: Function split_stru_block: structured parsing approach.
The function effectively splits lines by recognized keywords. However, be mindful if ABACUS adds more blocks in future. Consider a more dynamic approach (e.g., scanning for blocks until encountering a known pattern or end-of-file).


84-95: Function parse_numerical_orbital_block: minimal checks.
If a line might contain unexpected whitespace or partial data, robust error handling could help, though for a well-formed STRU file, it’s likely safe.


110-124: Function parse_lattice_vectors_block: parse loop.
The parsing logic is good. If the line count doesn't match 3 (for 3 vectors), consider raising an exception early to ensure correct format.


126-269: Function parse_pos_oneline: thorough parsing of position metadata.

  1. It gracefully handles optional tokens (move, velocity, etc.), but be aware that user errors (e.g., incomplete tokens) raise RuntimeError only after some partial parsing.
  2. Consider a consistent fallback strategy, such as ignoring invalid lines or logging a warning, depending on your use case.

272-315: Function get_atom_mag_cartesian: handle angle-based spin direction.

  • The transformation logic is clear.
  • Per the static analysis hint, lines 291-292 can use a single isinstance(atommag, (list, float)) check for conciseness.
  • Similarly, lines 306-309 can use a ternary expression:
-        if isinstance(atommag, list):
-            mag_norm = np.linalg.norm(atommag)
-        else:
-            mag_norm = atommag
+        mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag
🧰 Tools
🪛 Ruff (0.8.2)

291-291: Multiple isinstance calls for atommag, merge into a single call

Merge isinstance calls for atommag

(SIM101)


306-309: Use ternary operator mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag instead of if-else-block

Replace if-else-block with mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag

(SIM108)

🪛 GitHub Check: codecov/patch

[warning] 292-292: dpdata/abacus/stru.py#L292
Added line #L292 was not covered by tests


338-415: Function parse_pos: optional expansions.

  1. The loop merges multiple lines for each atom type. Ensure that lines always appear in the expected order (atom name → type mag → number → positions).
  2. Lines 396-399 and 401-404 can be condensed with a ternary operator to shrink code repetition.
🧰 Tools
🪛 Ruff (0.8.2)

375-375: Loop control variable iline not used within loop body

Rename unused iline to _iline

(B007)


396-399: Use ternary operator move = [] if all([i is None for i in move]) else np.array(move, dtype=bool) instead of if-else-block

Replace if-else-block with move = [] if all([i is None for i in move]) else np.array(move, dtype=bool)

(SIM108)


401-404: Use ternary operator velocity = [] if all([i is None for i in velocity]) else np.array(velocity) instead of if-else-block

Replace if-else-block with velocity = [] if all([i is None for i in velocity]) else np.array(velocity)

(SIM108)

🪛 GitHub Check: codecov/patch

[warning] 368-368: dpdata/abacus/stru.py#L368
Added line #L368 was not covered by tests


417-485: Function get_frame_from_stru: central STRU reading logic.

  1. The function sets up a single-frame representation (e.g., cells, coords) with shape 1 in the first dimension.
  2. If multiple frames in a single STRU file ever occur, consider how to handle them.
  3. The function does not store velocity or other attributes unless found. This is by design, but confirm it’s documented for future maintainers.
🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 447-447: dpdata/abacus/stru.py#L447
Added line #L447 was not covered by tests

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0af5e66 and 1b093d1.

📒 Files selected for processing (7)
  • dpdata/abacus/md.py (3 hunks)
  • dpdata/abacus/relax.py (4 hunks)
  • dpdata/abacus/scf.py (2 hunks)
  • dpdata/abacus/stru.py (1 hunks)
  • dpdata/plugins/abacus.py (3 hunks)
  • tests/test_abacus_relax.py (5 hunks)
  • tests/test_abacus_stru_dump.py (5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
dpdata/abacus/stru.py

291-291: Multiple isinstance calls for atommag, merge into a single call

Merge isinstance calls for atommag

(SIM101)


306-309: Use ternary operator mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag instead of if-else-block

Replace if-else-block with mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag

(SIM108)


375-375: Loop control variable iline not used within loop body

Rename unused iline to _iline

(B007)


396-399: Use ternary operator move = [] if all([i is None for i in move]) else np.array(move, dtype=bool) instead of if-else-block

Replace if-else-block with move = [] if all([i is None for i in move]) else np.array(move, dtype=bool)

(SIM108)


401-404: Use ternary operator velocity = [] if all([i is None for i in velocity]) else np.array(velocity) instead of if-else-block

Replace if-else-block with velocity = [] if all([i is None for i in velocity]) else np.array(velocity)

(SIM108)


623-627: Use a single if statement instead of nested if statements

(SIM102)


641-641: No explicit stacklevel keyword argument found

(B028)

🪛 GitHub Check: codecov/patch
dpdata/abacus/stru.py

[warning] 79-79: dpdata/abacus/stru.py#L79
Added line #L79 was not covered by tests


[warning] 292-292: dpdata/abacus/stru.py#L292
Added line #L292 was not covered by tests


[warning] 335-335: dpdata/abacus/stru.py#L335
Added line #L335 was not covered by tests


[warning] 368-368: dpdata/abacus/stru.py#L368
Added line #L368 was not covered by tests


[warning] 447-447: dpdata/abacus/stru.py#L447
Added line #L447 was not covered by tests


[warning] 549-550: dpdata/abacus/stru.py#L549-L550
Added lines #L549 - L550 were not covered by tests


[warning] 553-553: dpdata/abacus/stru.py#L553
Added line #L553 was not covered by tests


[warning] 556-557: dpdata/abacus/stru.py#L556-L557
Added lines #L556 - L557 were not covered by tests


[warning] 559-559: dpdata/abacus/stru.py#L559
Added line #L559 was not covered by tests


[warning] 584-584: dpdata/abacus/stru.py#L584
Added line #L584 was not covered by tests


[warning] 587-587: dpdata/abacus/stru.py#L587
Added line #L587 was not covered by tests


[warning] 628-629: dpdata/abacus/stru.py#L628-L629
Added lines #L628 - L629 were not covered by tests


[warning] 632-632: dpdata/abacus/stru.py#L632
Added line #L632 was not covered by tests


[warning] 641-641: dpdata/abacus/stru.py#L641
Added line #L641 was not covered by tests


[warning] 644-644: dpdata/abacus/stru.py#L644
Added line #L644 was not covered by tests


[warning] 648-648: dpdata/abacus/stru.py#L648
Added line #L648 was not covered by tests


[warning] 717-717: dpdata/abacus/stru.py#L717
Added line #L717 was not covered by tests


[warning] 736-736: dpdata/abacus/stru.py#L736
Added line #L736 was not covered by tests


[warning] 743-743: dpdata/abacus/stru.py#L743
Added line #L743 was not covered by tests


[warning] 781-781: dpdata/abacus/stru.py#L781
Added line #L781 was not covered by tests

🔇 Additional comments (15)
dpdata/abacus/scf.py (3)

11-12: Imports look consistent with the refactoring goals.
These lines correctly remove the unused EnergyConversion import and pull in the new get_frame_from_stru function for STRU parsing. No further issues observed.


218-224: Removal of 'spins' key might affect downstream usage.
The logic removes "spins" from the data unconditionally. Ensure that other modules or user code does not unexpectedly rely on spins from this SCF extraction.


238-242: Stress calculation depends on absolute cell determinant.
The code multiplies stress by |det(cell)|. Confirm that negative determinants—resulting from non-right-handed coordinate systems—do not need special handling. If using a negative sign is meaningful for orientation, confirm the design choice.

dpdata/abacus/md.py (2)

16-16: STRA parsing import confirmed.
Pulling get_frame_from_stru directly here aligns with the new design. The import is straightforward.


166-171: Potential mismatch of spins usage.
The code removes "spins" after reading from get_frame_from_stru. If magnetism data is relevant in MD runs, confirm that it’s intentionally dropped. Otherwise, consider preserving or logging it for debugging.

dpdata/abacus/stru.py (3)

57-82: Function parse_atomic_species_block: verifying array lengths.
The logic is straightforward. Consider validating each parsed line has at least two fields: atom_name and mass, given possible user error or truncated lines.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 79-79: dpdata/abacus/stru.py#L79
Added line #L79 was not covered by tests


97-108: Function parse_lattice_constant_block: single-line assumption.
Ensure that the user does not accidentally list multiple lines for the lattice constant or add trailing text. If that occurs, the code could break. A brief length check can improve robustness.


317-336: Function get_carteisan_coords: unit transforms are straightforward.
If coord_type is neither "cartesian" nor "direct", a RuntimeError is raised. This is good. Just confirm there’s no additional ABACUS coordinate system type in other versions.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 335-335: dpdata/abacus/stru.py#L335
Added line #L335 was not covered by tests

dpdata/plugins/abacus.py (1)

11-11: LGTM! Clean refactoring of imports and method calls.

The changes improve code readability by:

  1. Importing functions directly from the new stru module
  2. Simplifying function calls by removing namespace prefixes

Also applies to: 24-24, 42-47

dpdata/abacus/relax.py (3)

49-49: Improved error message clarity.

The error message now clearly distinguishes between the atom count in the log file versus the STRU file.


183-187: LGTM! Clean refactoring of data retrieval.

The changes improve code maintainability by:

  1. Using the new get_frame_from_stru function
  2. Properly handling the spins attribute

211-213: LGTM! Added proper handling of the move attribute.

The changes ensure that the move attribute is properly propagated across frames.

tests/test_abacus_relax.py (1)

21-24: LGTM! Improved test file handling.

The changes enhance test reliability by:

  1. Ensuring consistent STRU file setup across test cases
  2. Implementing proper cleanup in tearDown methods
  3. Removing unnecessary backup steps

Also applies to: 30-31, 100-103, 119-120, 141-142, 187-188

tests/test_abacus_stru_dump.py (2)

32-42: LGTM! Added comprehensive test for reading STRU file.

The new test ensures proper handling of:

  1. Pseudopotential files
  2. Orbital files
  3. Descriptor files
  4. File content validation

211-286: LGTM! Clean function renaming and error handling tests.

The changes maintain test coverage while:

  1. Using the new parse_pos_oneline function
  2. Preserving all error condition tests

@wanghan-iapcm wanghan-iapcm merged commit bd42e6a into deepmodeling:devel Feb 14, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants