Skip to content

Commit

Permalink
broadcom/compiler: update how we compute return_words_of_texture_data…
Browse files Browse the repository at this point in the history
… on non-ssa

For the non-ssa case, we were trying to use reg->num_components. But
this is not the same that nir_ssa_def_components_read. It is the
number of components of the destination register. And in the 16bit
case, even if nir_lower_tex packs the outcome, it doesn't update the
number of components, as nir_tex_instr_dest_size would still return
4. And nir validate would check that those values are the same.

So this change focuses on the last part of this comment at
nir_lower_tex:

 * Note that we don't change the destination num_components, because
 * nir_tex_instr_dest_size() will still return 4.  The driver is just
 * expected to not store the other channels, given that nothing at the
 * NIR level will read them.

We just limit how many channels we would use for the f16 case.

It is also worth to note, based on the CTS and different applications
we test, that this is a corner case.

This was detected when we experimented to enable nir_opt_gcm for v3d,
that lead to raise an assertion slightly below with some shaderdb
tests, but technically it could happen without it.

Reviewed-by: Iago Toral Quiroga <[email protected]>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
  • Loading branch information
infapi00 authored and Marge Bot committed Oct 26, 2022
1 parent ec10a37 commit 9cbc3ab
Showing 1 changed file with 25 additions and 12 deletions.
37 changes: 25 additions & 12 deletions src/broadcom/compiler/v3d40_tex.c
Original file line number Diff line number Diff line change
Expand Up @@ -229,16 +229,37 @@ v3d40_vir_emit_tex(struct v3d_compile *c, nir_tex_instr *instr)
unsigned texture_idx = instr->texture_index;
unsigned sampler_idx = instr->sampler_index;

/* Even if the texture operation doesn't need a sampler by
* itself, we still need to add the sampler configuration
* parameter if the output is 32 bit
*/
bool output_type_32_bit =
c->key->sampler[sampler_idx].return_size == 32 &&
!instr->is_shadow;

struct V3D41_TMU_CONFIG_PARAMETER_0 p0_unpacked = {
};

/* Limit the number of channels returned to both how many the NIR
* instruction writes and how many the instruction could produce.
*/
p0_unpacked.return_words_of_texture_data =
instr->dest.is_ssa ?
nir_ssa_def_components_read(&instr->dest.ssa) :
(1 << instr->dest.reg.reg->num_components) - 1;
if (instr->dest.is_ssa) {
p0_unpacked.return_words_of_texture_data =
nir_ssa_def_components_read(&instr->dest.ssa);
} else {
/* For the non-ssa case we don't have a full equivalent to
* nir_ssa_def_components_read. This is a problem for the 16
* bit case. nir_lower_tex will not change the destination as
* nir_tex_instr_dest_size will still return 4. The driver is
* just expected to not store on other channels, so we
* manually ensure that here.
*/
uint32_t num_components = output_type_32_bit ?
MIN2(instr->dest.reg.reg->num_components, 4) :
MIN2(instr->dest.reg.reg->num_components, 2);

p0_unpacked.return_words_of_texture_data = (1 << num_components) - 1;
}
assert(p0_unpacked.return_words_of_texture_data != 0);

struct V3D41_TMU_CONFIG_PARAMETER_2 p2_unpacked = {
Expand Down Expand Up @@ -294,14 +315,6 @@ v3d40_vir_emit_tex(struct v3d_compile *c, nir_tex_instr *instr)

vir_WRTMUC(c, QUNIFORM_TMU_CONFIG_P0, p0_packed);

/* Even if the texture operation doesn't need a sampler by
* itself, we still need to add the sampler configuration
* parameter if the output is 32 bit
*/
bool output_type_32_bit =
c->key->sampler[sampler_idx].return_size == 32 &&
!instr->is_shadow;

/* p1 is optional, but we can skip it only if p2 can be skipped too */
bool needs_p2_config =
(instr->op == nir_texop_lod ||
Expand Down

0 comments on commit 9cbc3ab

Please sign in to comment.