Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does the uisrnn pytorch model output exactly and which variable holds that output? #43

Open
Harry-Garrison opened this issue Nov 21, 2020 · 0 comments

Comments

@Harry-Garrison
Copy link

Excuse my ignornce, but I am trying to wrap my head around the inner workings of the uisrnn model and I am stuck. More specifically, I would like to know what the model outputs when it receives the VGG speaker embedding features. I struggle with this because the way the model is structured is too complicated; it all seems to be one continuous process and I cannot tell where the pytorch model's job starts and where it finishes. I looked into the uisrnn script and tried to trace the order in which the functions are executed, albeit with no success. To my understanding, the model outputs a sequence of "states" which are then processed and scored by a beam search algorithm. Then the scores are fed back into the model and the process continues until some certain point is reached (?).

Figuring out what the model does with the VGG speaker embeddings it receives is challenging to say the least. Problem is, I do not know where to start. Which parts of the inference process depend solely on the pytorch model and which parts of the code handle the rest (beam states, scoring etc)? Which part of the uisrnn script is responsible for processing the VGG embeddings and which variable holds the results thereof?

So far I have figured out the following:

This code loads the uisrnn object in memory and loads the weights.

model_args, _, inference_args = uisrnn.parse_arguments()
model_args.observation_dim = 512
uisrnnModel = uisrnn.UISRNN(model_args)
uisrnnModel.load(SAVED_MODEL_NAME)

This snippet runs inference on features (embeddings)

predicted_label = uisrnnModel.predict([feats], inference_args)

Now comes the hard part:

In the uisrnn script we get the following function:

def predict_single(self, test_sequence, args):
    ...

From that point on I have no idea what is going on. What does the model output after it has received the features and at which point in the code do we get the result of that computation? Is it the mean and hidden variables in:


class CoreRNN(nn.Module):
  """The core Recurent Neural Network used by UIS-RNN."""

  def __init__(self, input_dim, hidden_size, depth, observation_dim, dropout=0):
    super(CoreRNN, self).__init__()
    ...
    return mean, hidden

or is it something else? Most importantly, is the model feeding itself the features only or the maybe beam states too? I am so confused.

Understanding how exactly this code works could help with a variety of tasks, such as improving the code or turning the pytorch model into another format, in a more modular fashion. Any help is greatly appreciated.

@Harry-Garrison Harry-Garrison changed the title What does the uisrnn pytorch model output exactly and what variable holds that output? What does the uisrnn pytorch model output exactly and which variable holds that output? Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant