feat(fmt): fmt comments with local insertions (FuelLabs#4192)

## Description Closes FuelLabs#3938 ### Problem I've submitted a few PRs (FuelLabs#4129, FuelLabs#4093, FuelLabs#4005) implementing a new way of formatting comments in tackling FuelLabs#3938, but after submitting FuelLabs#4129 and following that working on formatting storage comments in a local branch realized that the method of formatting previously by writing `write_comments(..)` between every span is getting very unwieldy. That method results in a lot of repeated calls to `write_comments()`, and in some cases where we have to check if some value exists, it would lead to even more vertical space taken up because we have to do something like this: ```rust ... if let Some(item) = my_item { write_comments( formatted_code, start..end, formatter, ) // do other things } ... ``` My sense is that this will probably lead to the formatter code being hard to maintain in future because of the length of each `Format` implementations, and having to work your way through a lot of conditional code and repeated calls to `write_comments(..)` isn't a very nice experience. In FuelLabs#4005, the comment formatting for `ItemAbi` alone led to **56 additional line insertions** in that PR (previously it was 54 LoC, which means the LoC doubled!) ### Solution We adapt the original approach of inserting comments, but this time on a local scale. That means we parse each node on its own, compare its unformatted and formatted versions, and find out where to insert comments found between spans. Unlike the original approach, this approach will parse and format while the AST node is being visited and formatted - that means the entire `handle_comments(..)` that reparses the entire `Module` can be deprecated once this PR is in. This is done mainly through the function `rewrite_with_comments(..)` that is called at the end of a `impl Format`. Essentially what this function does is it parses the unformatted and formatted versions of the currently visited node with `parse_snippet::<P: sway_parse::Parse + Format>(..)`, finding the comments between the unformatted spans, and inserting them into the formatted spans. Based on the `CommentKind` of the found comments, we do some formatting to decide how they should be written. Of course, this also means that unlike the previously proposed method of using `write_comments()` everywhere, where we have the indentation context, this new approach means that we yet again do not have the local context. To figure out indentation, we assume that each comment will always be 'pinned' to some node below it - if this node does not exist we try to 'pin' to a node above it instead. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [x] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [ ] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [x] I have requested a review from the relevant team or maintainers.
Kristal1523 · Mar 10, 2023 · 318fcde · 318fcde
1 parent d735e99
commit 318fcde
Show file tree

Hide file tree

Showing 22 changed files with 454 additions and 421 deletions.
diff --git a/examples/config_time_constants/src/main.sw b/examples/config_time_constants/src/main.sw
@@ -29,9 +29,9 @@ abi TestContract {
 }
 
 impl TestContract for Contract {
-// ANCHOR: using_configurables
+    // ANCHOR: using_configurables
     fn return_configurables() -> (u8, bool, [u32; 3], str[4], StructWithGeneric<u8>) {
         (U8, BOOL, ARRAY, STR_4, STRUCT)
     }
-// ANCHOR_END: using_configurables
+    // ANCHOR_END: using_configurables
 }
diff --git a/examples/option/src/main.sw b/examples/option/src/main.sw
@@ -14,7 +14,7 @@ fn main() {
     match result {
         // The division was valid
         Option::Some(x) => std::logging::log(x),
-         // The division was invalid
+        // The division was invalid
         Option::None => std::logging::log("Cannot divide by 0"),
     }
 }
diff --git a/examples/storage_variables/src/main.sw b/examples/storage_variables/src/main.sw
@@ -38,7 +38,7 @@ impl StorageExample for Contract {
         storage.var2.z = true;
     }
     // ANCHOR_END: storage_write
-    
+
     // ANCHOR: storage_read
     #[storage(read)]
     fn get_something() -> (u64, u64, b256, bool) {

diff --git a/sway-ast/src/token.rs b/sway-ast/src/token.rs
@@ -129,6 +129,25 @@ pub enum CommentKind {
     /// var foo = 1; // Trailing comment
     /// ```
     Trailing,
+
+    /// An inlined comment is a block comment nested between 2 tokens without a newline after it.
+    ///
+    /// # Examples
+    ///
+    /// ```sway
+    /// fn some_function(baz: /* inlined comment */ u64) {}
+    /// ```
+    Inlined,
+
+    /// A multiline comment is a block comment that may be nested between 2 tokens with 1 or more newlines within it.
+    ///
+    /// # Examples
+    ///
+    /// ```sway
+    /// fn some_function(baz: /* multiline
+    ///                          comment */ u64) {}
+    /// ```
+    Multilined,
 }
 
 #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Hash)]

diff --git a/sway-parse/src/token.rs b/sway-parse/src/token.rs
@@ -158,7 +158,6 @@ pub fn lex_commented(
                         .filter(|&c| c == '\n')
                         .count()
                         > 0;
-
                     // We found a comment at the start of file, which should be accounted for as a Newlined comment.
                     let start_of_file_found = search_end == 0 && index == 0;
 
@@ -187,7 +186,7 @@ pub fn lex_commented(
                     continue;
                 }
                 Some((_, '*')) => {
-                    if let Some(token) = lex_multiline_comment(&mut l, index) {
+                    if let Some(token) = lex_block_comment(&mut l, index) {
                         token_trees.push(token);
                     }
                     continue;
@@ -393,7 +392,7 @@ fn lex_line_comment(
     }
 }
 
-fn lex_multiline_comment(l: &mut Lexer<'_>, index: usize) -> Option<CommentedTokenTree> {
+fn lex_block_comment(l: &mut Lexer<'_>, index: usize) -> Option<CommentedTokenTree> {
     // Lexing a multi-line comment.
     let _ = l.stream.next();
     let mut unclosed_indices = vec![index];
@@ -405,6 +404,9 @@ fn lex_multiline_comment(l: &mut Lexer<'_>, index: usize) -> Option<CommentedTok
         None
     };
 
+    // We first start by assuming that block comments are inlined.
+    let mut comment_kind = CommentKind::Inlined;
+
     loop {
         match l.stream.next() {
             None => return unclosed_multiline_comment(l, unclosed_indices),
@@ -419,13 +421,7 @@ fn lex_multiline_comment(l: &mut Lexer<'_>, index: usize) -> Option<CommentedTok
                         // We could represent them as several ones, but that's unnecessary.
                         let end = slash_ix + '/'.len_utf8();
                         let span = span(l, start, end);
-                        return Some(
-                            Comment {
-                                span,
-                                comment_kind: CommentKind::Newlined,
-                            }
-                            .into(),
-                        );
+                        return Some(Comment { span, comment_kind }.into());
                     }
                 }
                 Some(_) => {}
@@ -436,6 +432,13 @@ fn lex_multiline_comment(l: &mut Lexer<'_>, index: usize) -> Option<CommentedTok
                 Some((_, '*')) => unclosed_indices.push(next_index),
                 Some(_) => {}
             },
+            Some((_, '\n')) => {
+                // If we find a newline character while lexing, this means that the block comment is multiline.
+                // Example:
+                // /* this is a
+                //    multilined block comment */
+                comment_kind = CommentKind::Multilined;
+            }
             Some(_) => {}
         }
     }

diff --git a/swayfmt/src/comments.rs b/swayfmt/src/comments.rs
@@ -1,9 +1,16 @@
+use ropey::Rope;
 use std::{fmt::Write, ops::Range};
 use sway_ast::token::{Comment, CommentKind};
-use sway_types::Spanned;
+use sway_types::{Span, Spanned};
 
 use crate::{
-    formatter::FormattedCode, utils::map::comments::CommentMap, Formatter, FormatterError,
+    formatter::FormattedCode,
+    parse::parse_snippet,
+    utils::map::{
+        byte_span::{ByteSpan, LeafSpans},
+        comments::CommentMap,
+    },
+    Format, Formatter, FormatterError,
 };
 
 pub type UnformattedCode = String;
@@ -60,6 +67,9 @@ fn write_trailing_comment(
 /// Given a range, writes comments contained within the range. This function
 /// removes comments that are written here from the CommentMap for later use.
 ///
+/// Most comment formatting should be done using `rewrite_with_comments` in
+/// the context of the AST, but in some cases (eg. at the end of module) we require this function.
+///
 /// Returns:
 /// `Ok(true)` on successful execution with comments written,
 /// `Ok(false)` on successful execution and if there are no comments within the given range,
@@ -102,6 +112,13 @@ pub fn write_comments(
                 CommentKind::Trailing => {
                     write_trailing_comment(formatted_code, comment)?;
                 }
+                CommentKind::Inlined => {
+                    // We do a trim and truncate here to ensure that only a single whitespace separates
+                    // the inlined comment from the previous token.
+                    formatted_code.truncate(formatted_code.trim_end().len());
+                    write!(formatted_code, " {} ", comment.span().as_str(),)?;
+                }
+                CommentKind::Multilined => {}
             }
         }
     }
@@ -117,6 +134,233 @@ pub fn write_comments(
     Ok(true)
 }
 
+/// The main function that rewrites a piece of formatted code with comments found in its unformatted version.
+///
+/// This takes a given AST node's unformatted span, its leaf spans and its formatted code (a string) and
+/// parses the equivalent formatted version to get its leaf spans. We traverse the spaces between both
+/// formatted and unformatted leaf spans to find possible comments and inserts them between.
+pub fn rewrite_with_comments<T: sway_parse::Parse + Format + LeafSpans>(
+    formatter: &mut Formatter,
+    unformatted_span: Span,
+    unformatted_leaf_spans: Vec<ByteSpan>,
+    formatted_code: &mut FormattedCode,
+    last_formatted: usize,
+) -> Result<(), FormatterError> {
+    // Since we are adding comments into formatted code, in the next iteration the spans we find for the formatted code needs to be offsetted
+    // as the total length of comments we added in previous iterations.
+    let mut offset = 0;
+    let mut to_rewrite = formatted_code[last_formatted..].to_string();
+
+    let formatted_leaf_spans = parse_snippet::<T>(&formatted_code[last_formatted..])
+        .unwrap()
+        .leaf_spans();
+
+    let mut previous_unformatted_leaf_span = unformatted_leaf_spans
+        .first()
+        .ok_or(FormatterError::CommentError)?;
+    let mut previous_formatted_leaf_span = formatted_leaf_spans
+        .first()
+        .ok_or(FormatterError::CommentError)?;
+    for (unformatted_leaf_span, formatted_leaf_span) in unformatted_leaf_spans
+        .iter()
+        .zip(formatted_leaf_spans.iter())
+    {
+        // Search for comments between the previous leaf span's end and the next leaf span's start
+        let range = std::ops::Range {
+            start: previous_unformatted_leaf_span.end,
+            end: unformatted_leaf_span.start,
+        };
+        let iter = formatter.comments_context.map.comments_between(&range);
+
+        let mut comments_found = vec![];
+        for i in iter {
+            comments_found.push(i.clone());
+        }
+
+        if !comments_found.is_empty() {
+            let extra_newlines = collect_extra_newlines(unformatted_span.clone(), &comments_found);
+
+            offset += insert_after_span(
+                previous_formatted_leaf_span,
+                comments_found.clone(),
+                offset,
+                &mut to_rewrite,
+                extra_newlines,
+            )?;
+
+            formatter
+                .comments_context
+                .map
+                .retain(|bs, _| !bs.contained_within(&range))
+        }
+
+        previous_unformatted_leaf_span = unformatted_leaf_span;
+        previous_formatted_leaf_span = formatted_leaf_span;
+    }
+
+    formatted_code.truncate(last_formatted);
+    write!(formatted_code, "{to_rewrite}")?;
+    Ok(())
+}
+
+/// Collect extra newline before comment(s). The main purpose of this function is to maintain
+/// newlines between comments when inserting multiple comments at once.
+fn collect_extra_newlines(unformatted_span: Span, comments_found: &Vec<Comment>) -> Vec<usize> {
+    // The first comment is always assumed to have no extra newlines before itself.
+    let mut extra_newlines = vec![0];
+
+    if comments_found.len() == 1 {
+        return extra_newlines;
+    }
+
+    let mut prev_comment: Option<&Comment> = None;
+    for comment in comments_found {
+        if let Some(prev_comment) = prev_comment {
+            // Get whitespace between the end of the previous comment and the start of the next.
+            let whitespace_between = unformatted_span.as_str()[prev_comment.span().end()
+                - unformatted_span.start()
+                ..comment.span().start() - unformatted_span.start()]
+                .to_string();
+
+            // Count the number of newline characters we found above.
+            // By default, we want 0 extra newlines, but if there are more than 1 extra newline, we want to squash it to 1.
+            let mut extra_newlines_count = 0;
+            if whitespace_between.chars().filter(|&c| c == '\n').count() > 1 {
+                extra_newlines_count = 1;
+            };
+
+            extra_newlines.push(extra_newlines_count);
+        }
+
+        prev_comment = Some(comment);
+    }
+
+    extra_newlines
+}
+
+/// Check if a block is empty. When formatted without comments, empty code blocks are formatted into "{}", which is what this check is for.
+fn is_empty_block(formatted_code: &FormattedCode, end: usize) -> bool {
+    formatted_code.chars().nth(end - 1) == Some('{') && formatted_code.chars().nth(end) == Some('}')
+}
+
+/// Main driver of writing comments. This function is a no-op if the block of code is empty.
+///
+/// This iterates through comments inserts each of them after a given span and returns the offset.
+/// While inserting comments this also inserts whitespaces/newlines so that alignment is intact.
+/// To do the above, there are some whitespace heuristics we stick to:
+///
+/// 1) Assume comments are anchored to the line below, and follow its alignment.
+/// 2) In some cases the line below is the end of the function eg. it contains only a closing brace '}'.
+///    in such cases we then try to anchor the comment to the line above.
+/// 3) In the cases of entirely empty blocks we actually should prefer using `write_comments` over
+///    `rewrite_with_comments` since `write_comments` would have the formatter's indentation context.
+fn insert_after_span(
+    from: &ByteSpan,
+    comments_to_insert: Vec<Comment>,
+    offset: usize,
+    formatted_code: &mut FormattedCode,
+    extra_newlines: Vec<usize>,
+) -> Result<usize, FormatterError> {
+    let mut comment_str = String::new();
+
+    // We want to anchor the comment to the next line, and here,
+    // we make the assumption here that comments will never be right before the final leaf span.
+    let mut indent = formatted_code[from.end + offset..]
+        .chars()
+        .take_while(|c| c.is_whitespace())
+        .collect::<String>();
+
+    // In the case of empty blocks, we do not know the indentation of comments at that time.
+    // Writing comments in empty blocks has to be deferred to `write_comments` instead, which will
+    // contain the Formatter's indentation context.
+    if !is_empty_block(formatted_code, from.end) {
+        // There can be cases where comments are at the end.
+        // If so, we try to 'pin' our comment's indentation to the previous line instead.
+        if formatted_code.chars().nth(from.end + offset + indent.len()) == Some('}') {
+            // It could be possible that the first comment found here is a Trailing,
+            // then a Newlined.
+            // We want all subsequent newlined comments to follow the indentation of the
+            // previous line that is NOT a comment.
+            if comments_to_insert
+                .iter()
+                .any(|c| c.comment_kind == CommentKind::Newlined)
+            {
+                // Find and assign the indentation of the previous line to `indent`.
+                let prev_line = formatted_code[..from.end + offset]
+                    .trim_end()
+                    .chars()
+                    .rev()
+                    .take_while(|&c| c != '\n')
+                    .collect::<String>();
+                indent = prev_line
+                    .chars()
+                    .rev()
+                    .take_while(|c| c.is_whitespace())
+                    .collect();
+                if let Some(comment) = comments_to_insert.first() {
+                    if comment.comment_kind != CommentKind::Trailing {
+                        comment_str.push('\n');
+                    }
+                }
+            }
+        }
+
+        for (comment, extra_newlines) in comments_to_insert.iter().zip(extra_newlines) {
+            // Check for newlines to preserve.
+            for _ in 0..extra_newlines {
+                comment_str.push('\n');
+            }
+
+            match comment.comment_kind {
+                CommentKind::Trailing => {
+                    if comments_to_insert.len() > 1 && indent.starts_with('\n') {
+                        write!(comment_str, " {}", comment.span().as_str())?;
+                    } else {
+                        writeln!(comment_str, " {}", comment.span().as_str())?;
+                    }
+                }
+                CommentKind::Newlined => {
+                    if comments_to_insert.len() > 1 && indent.starts_with('\n') {
+                        write!(comment_str, "{}{}", indent, comment.span().as_str())?;
+                    } else {
+                        writeln!(comment_str, "{}{}", indent, comment.span().as_str())?;
+                    }
+                }
+                CommentKind::Inlined => {
+                    if !formatted_code[..from.end].ends_with(' ') {
+                        write!(comment_str, " ")?;
+                    }
+                    write!(comment_str, "{}", comment.span().as_str())?;
+                    if !formatted_code[from.end + offset..].starts_with([' ', '\n']) {
+                        write!(comment_str, " ")?;
+                    }
+                }
+                CommentKind::Multilined => {
+                    write!(comment_str, "{}{}", indent, comment.span().as_str())?;
+                }
+            };
+        }
+
+        let mut src_rope = Rope::from_str(formatted_code);
+
+        // We do a sanity check here to ensure that we don't insert an extra newline
+        // if the place at which we're going to insert comments already ends with '\n'.
+        if let Some(char) = src_rope.get_char(from.end + offset) {
+            if char == '\n' && comment_str.ends_with('\n') {
+                comment_str.pop();
+            }
+        };
+
+        // Insert the actual comment(s).
+        src_rope.insert(from.end + offset, &comment_str);
+
+        formatted_code.clear();
+        formatted_code.push_str(&src_rope.to_string());
+    }
+
+    Ok(comment_str.len())
+}
+
 #[cfg(test)]
 mod tests {
     use crate::utils::map::byte_span::ByteSpan;