-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
functionalize()
doesn't properly handle aliased program inputs
#950
Comments
Can't we just make a synthetic base? You know the size and the stride of each of the input tensors, you know they share a storage, so you can reinterpret them as views on a freshly created tensor representing the 1D storage. Also, |
If you want to get super fancy, you can compute the minimum extent of the original tensor necessary to serve all the views you are given. But I don't think it's important to make this efficient, just to make it correct. |
Agreed that we can do that in most cases, but not for all cases (in example 2 above). My question is, given that "synthesizing a base" won't work for all cases, should we just not bother with it and use the
Oh yes agreed - if we have access to the base as a program input, we should only copy to that and not any of its views (seems fixable) |
Could you explain more why it doesn't work in example 2? |
Talked more with Ed offline. Summarizing: Agreed that we can create synthesize a base in all cases, and we don't need the So the steps are: (1) group all tensor arguments by their aliases (e.g. with The easiest way to add the view relationships would be with Also- when we add |
cc @zou3519 this seems very related to what we are doing? |
@laithsakka this is something else, AOTAutograd handles aliased program inputs, the functorch.functionalize API (which is different from the functionalization used by AOTAutograd) doesn't. |
for better or worse 😛 |
One reason that functionalization can't be written as a pure graph transform is that its output can depend on the input metadata - specifically whether or not the program inputs alias each other (so you could probably write it as a graph pass, as long as the graph was annotated with aliasing info about the program inputs).
That case actually isn't properly handled by
functionalize()
today, but it should be.Take this example program:
If you run functionalization with sample inputs that do not alias each other, then you'd expect to get out a function like below:
If the inputs are aliased though, then functionalization should do something different. Below are two examples (the second one is harder to handle than the first):
Notice that in the second example, we need to add a new input to the traced graph, that didn't show up in the original program! In some cases, In order to correctly remove aliasing, the program needs access to the base tensor of the inputs - which isn't a direct input into the program, but can be obtained indirectly through
tensor._base
.Implementation?
Problem 1: One issue is that we don't have an easy way, given a bunch of tensors that are known to alias, to find their "base" tensor.
The easiest way would probably be to use the
_base
attribute on the tensor, which autograd conveniently tracks for us. As @ezyang pointed out though, because this info is tracked by autograd, none of it is tracked if you're running your program underinference_mode()
.In a lot of cases though, we should be able to "elect" a base. For example, given:
We should be able to detect inside of
functionalize()
that "a" can be effectively used as a base of "b" and "c" (I'm not sure exactly what the implementation would look like, but a's sizes/strides/storage offset should tell us that its memory fully covers b's + c's memory).That still doesn't help us in all situations though, like in example 2 above. Or, if we run
functionalize(foo)(b, c)
with views given above.I'm pretty sure that we're forced to rely on the
_base
attribute in those situations. We could either:(1) Detect that situation, and error if
inference_mode()
is turned on.(2) Keep inference mode turned off when inside of a
torchdynamo
-enabled region of the program.inference_mode()
is an eager-mode concept, and any overhead that autograd-tracking creates will be traced away when you're running withtorchdynamo
, so this could be reasonable.Problem 2: Tensor subclasses
In order to even detect the situation described in this issue, we need to know that two program inputs actually have aliased storages. This can be a problem though, if the inputs to functionalization are ("has-a") tensor subclasses instead of ordinary
torch.Tensor
objects.The most important example of this is probably
ProxyTensor
(andFakeTensor
?), since that's what we're using for tracing. The "dumb" way to detect aliasing of two proxy tensors is:This might be an ok workaround for dynamo tracing, but we'd still end up with silent correctness issues if you're using another "has-a" tensor subclass.
I'm not sure what the best solution to this is (although the problem is pretty niche, since functionalization + custom has-a tensor subclass + aliased program inputs together hopefully aren't too common).
We could require subclasses to implement a
def aliases(other: Tensor) -> bool
method in order to properly work with functionalization?cc @Chillee
The text was updated successfully, but these errors were encountered: