-
Notifications
You must be signed in to change notification settings - Fork 859
FAQ
Technically, no. It is merely a tool, which makes a best effort to follow the style guide. That said, there are no known bugs where formatting code will introduce a style violation. If you see one, please report it!
And of course, many style rules concern issues the formatter has nothing to do with, such as naming.
google-java-format
exposes extremely few configuration options that govern
formatting behavior. Most options are used only to switch entire behaviors on or
off. (The primary exception is --aosp
which allows the tool to be used with
AOSP code, but inside Google this option is never
surfaced through all the various integrations.)
The explicit goals of this tool are to bring consistency to the codebase, and to free developers from arguments over style choices. It is an explicit non-goal to support developers' individual preferences, and in fact this would work directly against our primary goals.
Also, the lack of configuration enables us to deliver a much more thoroughly-tested, high-quality tool. (As configurable parameters rise, the complexity and difficulty of testing rise exponentially.)
Finally, perhaps your preferred formatting is not merely a matter of taste, but
one that just may be objectively superior to google-java-format
's! In that
case, please see Reporting issues. Maybe we can all benefit!
There are at least two kinds of source code formatter one could build. One kind finds violations and fixes them (an auto-correcting linter). The other kind ignores the existing formatting choices in the file, seeing only a stream of tokens, and chooses a formatting for those tokens following a consistent set of rules.
google-java-format
(like clang-format
before it) is of the second kind (with
various exceptions). Its goal is not just to correct mistakes; it
is to free developers from having to make formatting decisions in the first
place, bringing greater consistency to your codebase. Every time the formatter
decides to preserve an existing formatting choice, it works directly against
that goal.
Users have also found it to be very liberating to not have to care what initial formatting they choose. Said one early adopter joyfully, "I just write code like a four-year-old doing finger paints!"
We have no plans to build the other kind of formatter.
There are exceptions. For example, the formatter preserves your choices of interior blank lines inside a method implementation. Before opting to preserve an existing formatting choice, we check whether all three of these criteria are met:
- The formatting choice has an important effect on readability.
- The choice depends on the nature of the code in question, not just on varying personal preferences.
- We consider it infeasible for the formatter to figure out an acceptable choice on its own.
Notice, for example, that these criteria are clearly met in the case of interior blank lines, and that's why the formatter preserves them.
The important thing to understand is that nearly every aspect of how the existing code was formatted is intentionally ignored. So, trying to understand its behavior in terms of why it chose certain changes will lead to confusion and difficulty communicating. For the most part, it sees only tokens, and it makes formatting decisions in total unawareness of the previous state.
So you run the formatter, and you read the code it produced, and you spot something you strongly dislike. Yes, this will happen. No matter how much we improve the formatter, it will never be as smart or tasteful as a human like you. When this happens, please report an issue if you suspect the formatter could have made a better global decision.
But then, do you grit your teeth and keep google-java-format
's "bad"
formatting in your current change? Or hand-edit your code to "fix" the problem?
You're absolutely entitled to hand-fix your code, but there are costs to doing so. You won't want to run the formatter again for the current CL, lest your changes get changed back, so you'll want to delay making those layout changes until your code has stabilized. And later, the next time those same lines of code are edited, the formatter may undo those formatting changes again. It's up to the author or reviewer to decide whether to manually carry forward the old style or not.
And of course, the decision to hand-format the code isn't a one-time decision that sticks; it may recur over and over.
We realize this situation will occasionally be annoying—but is it is annoying
enough, often enough, to justify peppering our codebase with special // format:off
comments?
So far, it seems the answer is no. The annoyance you feel is real, but it's usually transient, while those formatting comments would persist, visibly, in the code for everyone to see forever. They would reduce the signal-to-noise ratio and hurt readability, which is counter to the project goals.
Formatting can be disabled in javadoc comments using the <pre>
tag.
Please report an issue.
The formatter can't reliably format some portions of code; whenever nearby untouched portions weren't formatted exactly as the formatter would have done, the result would be mangled. As a result, it expands each range it is given until it includes a complete region of code that it knows how to format.
This means that it typically reformats entire statements, as well as method and class signatures.
If you're seeing far more modifications than necessary, please report an issue.
In rough priority order (most to least important):
- The physical layout of code should reflect the syntactic structure of the code, making the code easier to understand. See the style guide and the Rectangle Rule.
- Future code changes should have a small "blast radius"; a future change to code on one line should ideally not force formatting changes to surrounding lines. (For example, this explains why it doesn't use horizontal alignment.)
- Stylistic choices should be consistent with what most Google source code is already doing.
- Stylistic choices should be consistent with other languages at Google (particularly JavaScript and C++).
- Vertical space should be conserved. (While this is last in the list, it definitely still matters when all else is nearly equal.)
Why are its layout rules so hard and fast? Why doesn't it consider lots more options, then just pick the "best" one?
Internally, google-java-format
uses a deterministic layout engine that runs in
linear time. Some formatters take more time, so that they can, for instance,
generate all possible layouts for some given statement and pick the “best” one
(e.g., the one with the fewest lines, then the one with the lowest standard
deviation of the line lengths). Unfortunately, because it runs in linear time,
google-java-format
cannot and thus does not do this. It might be a nice
option, but it would require much more work.
Of course, even if google-java-format
considered lots more options, we would
still need a mechanical rule to determine which one is "best." There are fewer
magical panaceas than one might like.
As covered above, the top priority for the formatter's formatting decisions is to make the physical code structure reflect the logical code structure. An interesting corollary is that when the logical code is highly complicated, the physical layout will appear highly complicated as well! This is Working As Intended.
Our advice is usually to extract some temporary variables and helper methods!
google-java-format
already tries to do this, in some cases, when possible.
For example, without special treatment, the input expression 1 - 2 + 3 - 4 + 5
would be grouped like <<<<1
^- 2
>
^+ 3
>
^- 4
>
^+ 5
> (in an internal markup language
where < and
> mark groupings and
^ marks optional breaks). This
would produce surprising layouts like:
1 - 2 + 3
- 4
+ 5
This follows from the break-from-the-top rule. To avoid showing the user quite
so much left-associative structure, google-java-format
regroups this
expression as <1
^- 2
^+ 3
^- 4
^+ 5
>, letting these breaks break
independently at the same level, as opposed to breaking from the top. This
allows the improved layout:
1 - 2 + 3
- 4 + 5
(As shown here, google-java-format
will even merge different binary operators
if they have the same precedence. Although one might worry that that last layout
could be misread as meaning … - (4 + 5)
, if one were in the mood to worry so,
this is not a problem in practice.)
This special treatment is an improvement, and it would be great if we could
extend this sort of improvement elsewhere. For example, since Sets.union
is
associative, we might wish to lay out Sets.union(s1, Sets.union(s2, Sets.union(s3, Sets.union(s4, s5))))
to better reflect its associativity.
Unfortunately, google-java-format
is not a compiler.
For methods like Sets.union
, above, it would be great if we could annotate our
method definitions with rules on how their calls should be formatted.
Associativity is one example; similarly, perhaps certain chained calls should be
broken in special ways, and so on.
Unfortunately, google-java-format
is not a compiler. When it formats a file,
it doesn't resolve and read its imports and parse them and remember them. Doing
this, and doing it well, would need much more work.
google-java-format
does implement a few special formatting rules for method
calls that depend only on the syntax at the call. For example, it can guess that
a method call contains pairs of arguments and lay out each pair together, but
such a rule has to be weak to avoid false positives. Also, in specific cases
(currently just for calls that look like logging methods), its formatting
depends on the names of the methods.
We still hope it might in the future.
We do our best, but unusual cases sometimes arise.
As described above, the layout of the code follows its structure. Let's look at one example in detail.
In its internal markup language (as shown above),
google-java-format
notes what indentations should occur when breaks are
broken. The statement currentEstimate = (currentEstimate + argument / currentEstimate) / 2.0f;
is internally marked up as
<+4currentEstimate =
^<+4(currentEstimate
^+
<+4argument
^/ currentEstimate
>)
^/ 2.0f
>>;
. Here, breaks in each
grouping are indented +4 spaces from their enclosing groupings, allowing layouts
like:
currentEstimate = (currentEstimate + argument / currentEstimate) / 2.0f;
or:
currentEstimate =
(currentEstimate + argument / currentEstimate) / 2.0f;
or:
currentEstimate =
(currentEstimate + argument / currentEstimate)
/ 2.0f;
or:
currentEstimate =
(currentEstimate
+ argument / currentEstimate)
/ 2.0f;
or:
currentEstimate =
(currentEstimate
+ argument
/ currentEstimate)
/ 2.0f;
These layouts reflect the structure of the code, according to the Rectangle Rule. They may not always be what you would have generated by hand, but we believe they are readable and predictable.
This internal layout language is simple but powerful, We test the layout rules
(which map from the parse tree to the layout language) against special test
cases, and against google3
, and tweak them when they produce unexpected
results. (If you find an unexpected result, you can file a bug.)
Some cases are especially hard to test, like breaks following line comments. Consider the input:
currentEstimate = ( // This is a line comment.
currentEstimate + argument / currentEstimate) / 2.0f;
google-java-format
currently lays it out as:
currentEstimate =
( // This is a line comment.
currentEstimate + argument / currentEstimate)
/ 2.0f;
This indentation is a little ugly, but the alternatives are ugly too. We haven't seen enough unusual forced breaks like this to form a good idea of how best to lay them out.
When google-java-format
is invoked, it is optionally told which lines of its
input file have been modified and should be reformatted. It first reformats the
whole file, then merges the input and that intermediate output so that only
the modified lines have been reformatted, plus a possibly non-empty “blast
radius” on each. The merging step combines whole lines from the input and the
intermediate whole-file output, such that the token stream in the merged output
matches the input.
Once the blast radii have been chosen, we still must adjust indentation. If the input contains the lines:
> Function<T1, T2> function = new Function<>() { T2 apply(T1 x) {
> otherFunction(
x); // Long comment.
}
}
where only the marked lines have been modified, the whole-file output might be:
> Function<T1, T2> function =
> new Function<>() {
> T2 apply(T1 x) {
> otherFunction(
x); // Long comment.
}
}
(where the markings have been carried over). Merging these two blindly would give:
> Function<T1, T2> function =
> new Function<>() {
> T2 apply(T1 x) {
> otherFunction(
x); // Long comment.
}
}
This result is far from perfect. We should really adjust the indentation further, but it's not always clear how to. (Worse, we believe that implementing optimal reindentation might require multiple reformatting passes.)
To avoid hard problems, google-java-format
sometimes extends the blast radii a
little too far, causing too many lines to be reformatted. We imagine it might be
possible to avoid the worst misindentations by extending the blast radii more
intelligently, but how best to do this is an open question.
You bet!
For example, and as you know, the maximum length of lines in Java code is 100. We briefly considered setting the maximum trimmed line length (ignoring leading and trailing spaces) to be somewhat less, since we've heard that very long lines can be hard to read. We rejected this formatting option because, we imagined, it would surprise and annoy many users if the formatter avoided using the full 100 characters of width to which they might feel entitled.
Not annoying users is a great goal, but a tricky one. In several cases, we've had to balance abstract improvements against not annoying too many users. For example, the code fragment:
public Customer createCustomerLink(
Long externalEntityId,
String externalEntityIdStr,
@Required EntityNamespaceSubtype initiator,
@Required EntityNamespaceSubtype externalEntitySubtype,
Customer.ChangeEvent creationEvent,
Customer.ChangeEvent lastModificationEvent)
throws ApiException;
was once laid out as:
public Customer createCustomerLink(
Long externalEntityId,
String externalEntityIdStr,
@Required EntityNamespaceSubtype initiator,
@Required EntityNamespaceSubtype externalEntitySubtype,
Customer.ChangeEvent creationEvent,
Customer.ChangeEvent lastModificationEvent)
throws ApiException;
The earlier layout visibly separates the formals from the throws
clause, but
in the end we decided that the possible annoyance outweighed the possible
benefit.
google-java-format
always follows the rectangle rule, right? No exceptions?
Well, not quite. In a small number of cases we found that rigid adherence to the rectangle rule produced results that were surprising and unpleasant. So there are a few special-cases.
-
Methods shaped like
Mockito.when
are formatted as:when(remoteApi.findOrCreate( FOO_METADATA, Optional.<TheProto>absent(), AssignReserved.YES, AttachData.YES)) .thenReturn(OPERATION);
Using indentation to distinguish syntactic levels and always breaking from the top would produce:
when( remoteApi.findOrCreate( FOO_METADATA, Optional.<TheProto>absent(), AssignReserved.YES, AttachData.YES)) .thenReturn(OPERATION);
If there's one thing we can all agree on, it's that we will never all agree on the great one-or-two-space debate. There's no consensus. We believe neither choice is inherently right or wrong.
However, a formatter doesn't have the option of being neutral: when it rewraps a paragraph, such that a period that was formerly at the end of the line is now in the middle, it must choose a number of spaces to put after it. And because it doesn't know whether the period ends a sentence or not, one space is the only reasonable choice. Because this could increase inconsistency, we decided to standardize to one space between tokens in all cases throughout your documentation.
In other words, why does the formatter rewrite the following...
/**
* Tests for {@link Foo}.
*/
...to the following...?
/** Tests for {@link Foo}. */
First, recall that the formatter ignores most existing formatting. Given that policy, we have two choices: Standardize to one line, or standardize to three.
One-liners are shorter. But converting between one-liners and three-liners can be tedious, so some users prefer to stick with three. The formatter can give us the best of both worlds: It converts between one and three lines automatically, reformatting as necessary.
The formatter does not touch <pre>
sections because, in general, <pre>
means
something like "display this exactly as I have formatted it."
Of course, Javadoc <pre>
sections often contain sample code. We may someday
format such code automatically.
The formatter also does not touch <table>
sections. It might someday, but for
now, this seemed too difficult.
While such breaks are not universally beloved, they are legal, commonly used, and occasionally necessary to stay under 100 columns.
File bugs here.
It's nice to peruse this FAQ and the known issues first, but we don't yell at people for filing duplicates. Closing out lots of dups gives us a pleasant illusion of accomplishment.
If possible, please provide the specific code in question (ideally a link to the entire file, if possible). Please do this in a comment even if you found your bug filed already; more test cases are often useful.
Be aware: there will always be situations where you as a human being could make a better formatting choice than a robot. These don't all make for good issue reports. Sometimes there is simply no reasonable way we can expect a robot formatter, which has to serve the entire codebase, to have known what to do in your particular case. But when in doubt, file it!
What distinguishes it from other formatters is that it produces code in Google Style.