-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor the reading and writing of abacus/stru format #793
Conversation
CodSpeed Performance ReportMerging #793 will not alter performanceComparing Summary
|
📝 WalkthroughWalkthroughThis pull request refactors the data extraction process for ABACUS calculations by removing several legacy helper functions (such as Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant GetFrame
participant FrameFromStru
participant STRUParser
Caller->>GetFrame: Call get_frame(fname)
GetFrame->>FrameFromStru: Read and parse STRU file
FrameFromStru->>STRUParser: Split file blocks and parse sections
STRUParser-->>FrameFromStru: Return structured data dictionary
FrameFromStru-->>GetFrame: Return parsed data (atomic numbers, coordinates, etc.)
GetFrame-->>Caller: Return final frame data with updated moves and cleaned attributes
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #793 +/- ##
==========================================
+ Coverage 85.15% 85.33% +0.17%
==========================================
Files 81 82 +1
Lines 7526 7515 -11
==========================================
+ Hits 6409 6413 +4
+ Misses 1117 1102 -15 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
dpdata/abacus/stru.py (1)
487-787
: 🛠️ Refactor suggestionFunction
make_unlabeled_stru
: flexible STRU writer.
- Large parameter list. Consider splitting into subroutines or using a builder pattern if expansions continue.
- The
link_file
logic is convenient but watch out for conflicting symlinks.- The final loop generating atom lines is well-done but quite extensive; smaller helper methods could reduce complexity.
🧰 Tools
🪛 Ruff (0.8.2)
623-627: Use a single
if
statement instead of nestedif
statements(SIM102)
641-641: No explicit
stacklevel
keyword argument found(B028)
🪛 GitHub Check: codecov/patch
[warning] 549-550: dpdata/abacus/stru.py#L549-L550
Added lines #L549 - L550 were not covered by tests
[warning] 553-553: dpdata/abacus/stru.py#L553
Added line #L553 was not covered by tests
[warning] 556-557: dpdata/abacus/stru.py#L556-L557
Added lines #L556 - L557 were not covered by tests
[warning] 559-559: dpdata/abacus/stru.py#L559
Added line #L559 was not covered by tests
[warning] 584-584: dpdata/abacus/stru.py#L584
Added line #L584 was not covered by tests
[warning] 587-587: dpdata/abacus/stru.py#L587
Added line #L587 was not covered by tests
[warning] 628-629: dpdata/abacus/stru.py#L628-L629
Added lines #L628 - L629 were not covered by tests
[warning] 632-632: dpdata/abacus/stru.py#L632
Added line #L632 was not covered by tests
[warning] 641-641: dpdata/abacus/stru.py#L641
Added line #L641 was not covered by tests
[warning] 644-644: dpdata/abacus/stru.py#L644
Added line #L644 was not covered by tests
[warning] 648-648: dpdata/abacus/stru.py#L648
Added line #L648 was not covered by tests
[warning] 717-717: dpdata/abacus/stru.py#L717
Added line #L717 was not covered by tests
[warning] 736-736: dpdata/abacus/stru.py#L736
Added line #L736 was not covered by tests
[warning] 743-743: dpdata/abacus/stru.py#L743
Added line #L743 was not covered by tests
[warning] 781-781: dpdata/abacus/stru.py#L781
Added line #L781 was not covered by tests
🧹 Nitpick comments (12)
dpdata/abacus/scf.py (4)
205-206
: No-op lines callout.
These blank or comment lines have no functional impact. Recommend consolidating if they are not serving any clarity purpose.
213-217
: Early return upon non-convergence.
Whenconverge
is false, the function returns partially filleddata
. This is acceptable if the caller handles incomplete data gracefully. Otherwise, consider raising a dedicated exception or logging a warning to quickly identify non-converged runs.
226-226
: Clarify the comment.
The comment line provides a heading but does not add more detail. If needed, expand the comment to summarize how these properties (magmom, magforce, forces, stress) are collected.
248-249
: Retaining 'move' data.
Storing the poppedmove
back intodata
is consistent. Ensure external references or previously retrieved copies ofdata
are updated if they assumedmove
was absent.dpdata/abacus/md.py (1)
219-221
: Expanding 'move' data for multiple frames.
Repeating the single arraymove[0]
across all frames may cause confusion if atomic mobility changes dynamically. Confirm that this static approach is correct for your MD scenario.dpdata/abacus/stru.py (7)
14-55
: Functionsplit_stru_block
: structured parsing approach.
The function effectively splits lines by recognized keywords. However, be mindful if ABACUS adds more blocks in future. Consider a more dynamic approach (e.g., scanning for blocks until encountering a known pattern or end-of-file).
84-95
: Functionparse_numerical_orbital_block
: minimal checks.
If a line might contain unexpected whitespace or partial data, robust error handling could help, though for a well-formed STRU file, it’s likely safe.
110-124
: Functionparse_lattice_vectors_block
: parse loop.
The parsing logic is good. If the line count doesn't match 3 (for 3 vectors), consider raising an exception early to ensure correct format.
126-269
: Functionparse_pos_oneline
: thorough parsing of position metadata.
- It gracefully handles optional tokens (move, velocity, etc.), but be aware that user errors (e.g., incomplete tokens) raise
RuntimeError
only after some partial parsing.- Consider a consistent fallback strategy, such as ignoring invalid lines or logging a warning, depending on your use case.
272-315
: Functionget_atom_mag_cartesian
: handle angle-based spin direction.
- The transformation logic is clear.
- Per the static analysis hint, lines 291-292 can use a single
isinstance(atommag, (list, float))
check for conciseness.- Similarly, lines 306-309 can use a ternary expression:
- if isinstance(atommag, list): - mag_norm = np.linalg.norm(atommag) - else: - mag_norm = atommag + mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag🧰 Tools
🪛 Ruff (0.8.2)
291-291: Multiple
isinstance
calls foratommag
, merge into a single callMerge
isinstance
calls foratommag
(SIM101)
306-309: Use ternary operator
mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag
instead ofif
-else
-blockReplace
if
-else
-block withmag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag
(SIM108)
🪛 GitHub Check: codecov/patch
[warning] 292-292: dpdata/abacus/stru.py#L292
Added line #L292 was not covered by tests
338-415
: Functionparse_pos
: optional expansions.
- The loop merges multiple lines for each atom type. Ensure that lines always appear in the expected order (atom name → type mag → number → positions).
- Lines 396-399 and 401-404 can be condensed with a ternary operator to shrink code repetition.
🧰 Tools
🪛 Ruff (0.8.2)
375-375: Loop control variable
iline
not used within loop bodyRename unused
iline
to_iline
(B007)
396-399: Use ternary operator
move = [] if all([i is None for i in move]) else np.array(move, dtype=bool)
instead ofif
-else
-blockReplace
if
-else
-block withmove = [] if all([i is None for i in move]) else np.array(move, dtype=bool)
(SIM108)
401-404: Use ternary operator
velocity = [] if all([i is None for i in velocity]) else np.array(velocity)
instead ofif
-else
-blockReplace
if
-else
-block withvelocity = [] if all([i is None for i in velocity]) else np.array(velocity)
(SIM108)
🪛 GitHub Check: codecov/patch
[warning] 368-368: dpdata/abacus/stru.py#L368
Added line #L368 was not covered by tests
417-485
: Functionget_frame_from_stru
: central STRU reading logic.
- The function sets up a single-frame representation (e.g.,
cells
,coords
) with shape 1 in the first dimension.- If multiple frames in a single STRU file ever occur, consider how to handle them.
- The function does not store velocity or other attributes unless found. This is by design, but confirm it’s documented for future maintainers.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 447-447: dpdata/abacus/stru.py#L447
Added line #L447 was not covered by tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
dpdata/abacus/md.py
(3 hunks)dpdata/abacus/relax.py
(4 hunks)dpdata/abacus/scf.py
(2 hunks)dpdata/abacus/stru.py
(1 hunks)dpdata/plugins/abacus.py
(3 hunks)tests/test_abacus_relax.py
(5 hunks)tests/test_abacus_stru_dump.py
(5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
dpdata/abacus/stru.py
291-291: Multiple isinstance
calls for atommag
, merge into a single call
Merge isinstance
calls for atommag
(SIM101)
306-309: Use ternary operator mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag
instead of if
-else
-block
Replace if
-else
-block with mag_norm = np.linalg.norm(atommag) if isinstance(atommag, list) else atommag
(SIM108)
375-375: Loop control variable iline
not used within loop body
Rename unused iline
to _iline
(B007)
396-399: Use ternary operator move = [] if all([i is None for i in move]) else np.array(move, dtype=bool)
instead of if
-else
-block
Replace if
-else
-block with move = [] if all([i is None for i in move]) else np.array(move, dtype=bool)
(SIM108)
401-404: Use ternary operator velocity = [] if all([i is None for i in velocity]) else np.array(velocity)
instead of if
-else
-block
Replace if
-else
-block with velocity = [] if all([i is None for i in velocity]) else np.array(velocity)
(SIM108)
623-627: Use a single if
statement instead of nested if
statements
(SIM102)
641-641: No explicit stacklevel
keyword argument found
(B028)
🪛 GitHub Check: codecov/patch
dpdata/abacus/stru.py
[warning] 79-79: dpdata/abacus/stru.py#L79
Added line #L79 was not covered by tests
[warning] 292-292: dpdata/abacus/stru.py#L292
Added line #L292 was not covered by tests
[warning] 335-335: dpdata/abacus/stru.py#L335
Added line #L335 was not covered by tests
[warning] 368-368: dpdata/abacus/stru.py#L368
Added line #L368 was not covered by tests
[warning] 447-447: dpdata/abacus/stru.py#L447
Added line #L447 was not covered by tests
[warning] 549-550: dpdata/abacus/stru.py#L549-L550
Added lines #L549 - L550 were not covered by tests
[warning] 553-553: dpdata/abacus/stru.py#L553
Added line #L553 was not covered by tests
[warning] 556-557: dpdata/abacus/stru.py#L556-L557
Added lines #L556 - L557 were not covered by tests
[warning] 559-559: dpdata/abacus/stru.py#L559
Added line #L559 was not covered by tests
[warning] 584-584: dpdata/abacus/stru.py#L584
Added line #L584 was not covered by tests
[warning] 587-587: dpdata/abacus/stru.py#L587
Added line #L587 was not covered by tests
[warning] 628-629: dpdata/abacus/stru.py#L628-L629
Added lines #L628 - L629 were not covered by tests
[warning] 632-632: dpdata/abacus/stru.py#L632
Added line #L632 was not covered by tests
[warning] 641-641: dpdata/abacus/stru.py#L641
Added line #L641 was not covered by tests
[warning] 644-644: dpdata/abacus/stru.py#L644
Added line #L644 was not covered by tests
[warning] 648-648: dpdata/abacus/stru.py#L648
Added line #L648 was not covered by tests
[warning] 717-717: dpdata/abacus/stru.py#L717
Added line #L717 was not covered by tests
[warning] 736-736: dpdata/abacus/stru.py#L736
Added line #L736 was not covered by tests
[warning] 743-743: dpdata/abacus/stru.py#L743
Added line #L743 was not covered by tests
[warning] 781-781: dpdata/abacus/stru.py#L781
Added line #L781 was not covered by tests
🔇 Additional comments (15)
dpdata/abacus/scf.py (3)
11-12
: Imports look consistent with the refactoring goals.
These lines correctly remove the unusedEnergyConversion
import and pull in the newget_frame_from_stru
function for STRU parsing. No further issues observed.
218-224
: Removal of 'spins' key might affect downstream usage.
The logic removes"spins"
from the data unconditionally. Ensure that other modules or user code does not unexpectedly rely on spins from this SCF extraction.
238-242
: Stress calculation depends on absolute cell determinant.
The code multiplies stress by|det(cell)|
. Confirm that negative determinants—resulting from non-right-handed coordinate systems—do not need special handling. If using a negative sign is meaningful for orientation, confirm the design choice.dpdata/abacus/md.py (2)
16-16
: STRA parsing import confirmed.
Pullingget_frame_from_stru
directly here aligns with the new design. The import is straightforward.
166-171
: Potential mismatch of spins usage.
The code removes"spins"
after reading fromget_frame_from_stru
. If magnetism data is relevant in MD runs, confirm that it’s intentionally dropped. Otherwise, consider preserving or logging it for debugging.dpdata/abacus/stru.py (3)
57-82
: Functionparse_atomic_species_block
: verifying array lengths.
The logic is straightforward. Consider validating each parsed line has at least two fields:atom_name
andmass
, given possible user error or truncated lines.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 79-79: dpdata/abacus/stru.py#L79
Added line #L79 was not covered by tests
97-108
: Functionparse_lattice_constant_block
: single-line assumption.
Ensure that the user does not accidentally list multiple lines for the lattice constant or add trailing text. If that occurs, the code could break. A brief length check can improve robustness.
317-336
: Functionget_carteisan_coords
: unit transforms are straightforward.
Ifcoord_type
is neither"cartesian"
nor"direct"
, aRuntimeError
is raised. This is good. Just confirm there’s no additional ABACUS coordinate system type in other versions.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 335-335: dpdata/abacus/stru.py#L335
Added line #L335 was not covered by testsdpdata/plugins/abacus.py (1)
11-11
: LGTM! Clean refactoring of imports and method calls.The changes improve code readability by:
- Importing functions directly from the new
stru
module- Simplifying function calls by removing namespace prefixes
Also applies to: 24-24, 42-47
dpdata/abacus/relax.py (3)
49-49
: Improved error message clarity.The error message now clearly distinguishes between the atom count in the log file versus the STRU file.
183-187
: LGTM! Clean refactoring of data retrieval.The changes improve code maintainability by:
- Using the new
get_frame_from_stru
function- Properly handling the
spins
attribute
211-213
: LGTM! Added proper handling of themove
attribute.The changes ensure that the
move
attribute is properly propagated across frames.tests/test_abacus_relax.py (1)
21-24
: LGTM! Improved test file handling.The changes enhance test reliability by:
- Ensuring consistent STRU file setup across test cases
- Implementing proper cleanup in tearDown methods
- Removing unnecessary backup steps
Also applies to: 30-31, 100-103, 119-120, 141-142, 187-188
tests/test_abacus_stru_dump.py (2)
32-42
: LGTM! Added comprehensive test for reading STRU file.The new test ensures proper handling of:
- Pseudopotential files
- Orbital files
- Descriptor files
- File content validation
211-286
: LGTM! Clean function renaming and error handling tests.The changes maintain test coverage while:
- Using the new
parse_pos_oneline
function- Preserving all error condition tests
Refactor the codes to read and write ABACUS/STRU, and move the functions in a single file abaucs/stru.py
Now, now using dpdata.system to read an ABACUS STRU will also return below informations in data dict:
And, these information can also be written to a new STRU file automatically.
Later, I will based on this commit to fix the bug in dpgen deepmodeling/dpgen#1711
Summary by CodeRabbit
New Features
Refactor
Tests