How can a model be loaded from checkpoint if the weights take into account the validation set ? #42

chin-jey · 2022-08-17T09:52:24Z

chin-jey
Aug 17, 2022

Hello, I'm not sure I entirely understood how the weights of QuantumModel work.

Analyzing the source code, I figured that the model's weights are the unique symbols of the parametrized quantum circuit and they're initialized calling the from_diagram method. However, in the example shown in the documentation, you pass to that method both the training and the validation circuits, so that you eventually have those parameters as well in your model. Is this correct ?

If it is the case, I don't understand how can the model can be used on unseen data and which weights are loaded if the model is loaded from a checkpoint.

mspronesti · 2022-08-18T07:55:24Z

mspronesti
Aug 18, 2022

Hello lambeq-dev team,
I'd like to add to the above question another one: it's not entirely clear to me how the quantum model predicts unseen data. I think I figured that each QuantumModel has as many weights as the unique free symbols of the circuits, but please confirm it.

However, once the model is loaded from a checkpoint with those weights, what does it do exactly when I feed it with unseen sentences with possibly some unseen word ?

0 replies

Thommy257 · 2022-08-18T16:47:03Z

Thommy257
Aug 18, 2022

Hi,

it is correct that, in the example notebooks, we extract the parameters from all the diagrams of the dataset. Due to the small vocabulary size of the example tasks, it is highly unlikely that the validation or test set contain tokens that do not appear in the training set. However, if this very unlikely event occurs, we make sure that there's a parameter stored for each token, even if it's only initialised at random. Otherwise, the model would raise an exception during evaluation ("token not found, etc.)>

In reality, we don't have such a well-defined dataset or corpus and it is very likely, that the model has to deal with unknown words. This can be achieved by replacing rare words in the dataset/corpus with a dummy "< unk >" token. During testing/ inference we then replace any unseen word with the same "< unk >" token to avoid that the model raises an exception.

Be aware that for a syntax-based model like DisCoCat, the "< unk >" token will be word-type dependent, which makes the whole thing a little trickier.

I hope this helps.

3 replies

mspronesti Aug 18, 2022

Hi,
I'm not sure I got your answer. I understood that for unseen words you use an " < unk > " token but I still don't understand how the model uses the parameters to predict and how it generalizes. For instance, looking at TketModel, all the magic happens here I think

lambeq/lambeq/training/tket_model.py

Lines 103 to 116 in d911686

    
           lambdified_diagrams = [self._make_lambda(d) for d in diagrams] 
        
           tensors = Circuit.eval( 
        
               *[diag_f(*self.weights) for diag_f in lambdified_diagrams], 
        
               **self.backend_config, 
        
               seed=self._randint() 
        
           ) 
        
           self.backend_config['backend'].empty_cache() 
        
           # discopy evals a single diagram into a single result 
        
           # and not a list of results 
        
           if len(diagrams) == 1: 
        
               result = self._normalise_vector(tensors.array) 
        
               return result.reshape(1, *result.shape) 
        
           return np.array([self._normalise_vector(t.array) for t in tensors])

where, I think, you convert the test sentence into a diagram , then into a circuit via the ansatz and you run it using the parameters you learnt from training. Nevertheless, I can't figure how you practically do it considering that, from what I understood, the circuit associated to a sentence only depends on the ansatz and on the sentence itself (one symbol per word * num_single_roation_qubits).
Also, in the above snippet, you pass all the params to a test circuit. I see a logic in that, however I don't get how this works dimensionally.

Also I'm not sure I understand what happens for unseen words.
Can you please help me out digging a little more into it, perhaps sharing a reference if any exists (I haven't found any) ?

I suggest explaining all of this in the official documentation as well.

Thommy257 Aug 22, 2022

The <unk> token is generally treated as any other word. Hence, we can assign a set of parameters (either a classical vector/ tensor or a set of circuit rotations) to it and train it like any other word. It is basically a placeholder without any particular meaning.

The code snipped you shared works a bit differently. The sentences are already parsed outside the model, hence, we pass symbolic circuits to the TketModel. Symbolic means that the rotations for each circuit are stored as sympy.Symbol and don't have a specific value assigned yet. To make the circuits executable, we create lambda functions (line 104) which take the concrete circuit parameters as arguments.
We then evaluate the circuits (line 105-109) and store the resulting tensors for any further computations.

mspronesti Aug 22, 2022

@Thommy257 Apologies, I still don't get how the parameters are used. After reading the majority of the codebase, I got an idea of the sw design of lambeq, however I couldn' understand the "flow" of the weights. Below a couple of more aimed questions.

To make the circuits executable, we create lambda functions (line 104) which take the concrete circuit parameters as arguments.

Which parameters ? I'm a little puzzled on the dimension. Let's discard the classical case for a second, I'm only working with the quantum one. In the quantum scenario, I think that the number of parameters per circuit only depends on the sentence and on the parameters of the ansatz.

For the params of the model, I'm not sure. From some "empirical" attempts, it seems to me they are all the params of the quantum circuits discarding duplicate symbols. Not sure though.

Therefore, in my mind (but I'm pretty sure I'm missing the big picture) when you create a circuit its shape and params have nothing to do with the model, in principle, but only with the ansatz and the sentence itself. Therefore, I don't get dimensionally how you can run a circuit feeding the model params to it.

I'd love if you could provide a detailed walkthrough of what the model does from that perspective, with all the steps. including how many weights, what do they depend on, how they are fed to the testing circuits (dimensionally). Maybe an example might clarify all my doubts. Thanks in advance!

Thommy257 · 2022-08-22T13:40:24Z

Thommy257
Aug 22, 2022

Hi, no problem.

Imagine your dataset consists of only one diagram which represents the sentence "Alice loves Bob". You apply a circuit ansatz and your circuit might look like this (the image is a bit cropped, sorry):

You can see that the circuit contains 8 symbolic parameters (Alice__n_0, Alice__n_1, ...). In order to evaluate the circuit, we now need to replace the parameters with concrete values. The lambeq models do that automatically, here's how it works:

First of all, you initialise the TketModel by calling the the from_diagrams() classmethod:

model = TketModel.from_diagrams([alice_loves_bob])

The from_diagrams() method takes all the diagrams from a list, extracts all the symbolic parameters, and stores them in a sorted set (in this case 8 symbols):

lambeq/lambeq/training/model.py

Lines 146 to 148 in d911686

    
           model.symbols = sorted( 
        
               {sym for circ in diagrams for sym in circ.free_symbols}, 
        
               key=default_sort_key)

After that, we can call the method initialise_weights():

TketModel.initialise_weights()

This method initialises a random value per symbol, stored in a list under the attribute model.weights:

lambeq/lambeq/training/quantum_model.py

Lines 90 to 94 in d911686

    
           if not self.symbols: 
        
               raise ValueError('Symbols not initialised. Instantiate through ' 
        
                                '`from_diagrams()`.') 
        
           assert all(w.size == 1 for w in self.symbols) 
        
           self.weights = np.random.rand(len(self.symbols))

To evaluate the circuit, we now need to "lambdify". "Lambdifying" is a term that comes from symbolic python, which means converting a symbolic expression into an lambda function that takes the concrete values as arguments. Consider this example:

>>> from sympy.abc import x
>>> from sympy.utilities.lambdify import lambdify, implemented_function
>>> f = implemented_function('f', lambda x: x+1)
>>> lam_f = lambdify(x, f(x))
>>> lam_f(4)
5

The same thing happens with the circuit in the model's method _make_lambda() (line 72):

lambeq/lambeq/training/tket_model.py

Lines 69 to 72 in d911686

    
           def _make_lambda(self, diagram: Diagram) -> Callable[[Any], Any]: 
        
               """Measure and lambdify diagrams.""" 
        
               measured = diagram >> Id().tensor(*[Measure()] * len(diagram.cod)) 
        
               return measured.lambdify(*self.symbols)

We create a lambda function by passing all possible symbols to .lambdify which leaves us with an executable circuit function. Passing those lambdas to Circuit.eval() with the model.weights let's us evaluate the circuit and record the resulting tensors:

lambeq/lambeq/training/tket_model.py

Lines 105 to 108 in d911686

    
           tensors = Circuit.eval( 
        
               *[diag_f(*self.weights) for diag_f in lambdified_diagrams], 
        
               **self.backend_config, 
        
               seed=self._randint()

Defining a loss and estimating some gradient let's us now tune the values stored in model.weights to fit our data.

4 replies

mspronesti Aug 22, 2022

Thanks a lot for your quick reply !
I think I got what happens up to this point, which is the training

Now let's assume I trained this model with this one-sentence training set. Then I save the weights and give it to somebody else who loads it from a checkpoint. He wants to use it on a one-sentence test set containing the sentence "Max took a flight from France to England". What happens now ?

I assume that, since you are using a pre-trained model (hence you already have the parameters), we will do what you described above starting from lambdyfication step, right (hence skipping the first 2 steps) ?
If it is the case, how do the model's weights fits together with this lambdified diagram ? Perhaps I keep stressing a wrong idea, but in my mind the circuit you try to execute has ''already a shape" before receiving the weights (which are the rotation angles in my mind) and this shape is determined by the ansatz and the number of word. Then, what if your model has 350 params and you want to run the circuit associated to a simple test sentence with 3 params ?

Thommy257 Aug 22, 2022

The lambdification happens in the model. Currently, we store the symbols and it's values in two different attributes: model.symbols and model.weights which are of same length. I think it would be easier for you, if you'd imagine storing both in a mapping: model.params = {'symbol_1': 0.13, 'symbol_2': 0.76, ...}.

If a circuit with an unknown word is passed to the model, it would raise an error. Hence, it is crucial that you include a dummy <unk> token during training.

Ergo, in you specific example, the model would raise an error because there are symbols in the circuit that are not known to the model.

Thommy257 Aug 22, 2022

(regarding your question, yes, we start with the lambdification, skipping the first 2 steps)

mspronesti Aug 22, 2022

Then, essentially, these models can not be used without having the test set in advance right ? In other words, I can not give my pre-trained model to somebody else to use it on his arbitrary sentences because I need them in advance to fill my model with those <unk> symbols. If it is the case, isn't this a big limitation ? I mean, this forces the model to see in advance the test data, even if it does not do any computation on them (right?).

If all the above statements are correct, can't you guys just play the little trick of catching the exception raised and adding an <unk> in the weights dictionary ?

In other words, if I got it right, the reason why you are "forced" to see the test set in advance is because you need to have a placeholder for all the symbols you will meet in all your forward() calls (both for training and testing). For the training phase, no problems: you will call the from_diagrams and no exception will be raised, whereas for the testing phase, this problem might occur and my suggestion is catching the exception and adding the missing symbol, set to <unk>.

Anyway (assuming I reconstructed all the pieces correctly in my mind) I seriously suggest looking into this aspect, as having a model not able to predict without a "preview" of the test set is (in my humble opinion) not desirable.

Thommy257 · 2022-08-23T14:01:40Z

Thommy257
Aug 23, 2022

By default, yes this is true. However, you can always adapt a model to your needs by creating a custom model that inherits from TketModel for example. Such a custom model could then take care of replacing unknown words with the <unk> token.

Keep in mind, that the problem of how to deal with unknown words is apparent in all NLP, and is not lambeq specific. There are many different approaches to tackle this and we don't want to limit the users choices by implementing a default way of doing it.

We've designed the models having research applications in mind, where normally well-defined validation and test sets are available. However, you are right that we might want to change some things with the current interface.

At the moment, we are passing Circuits to the QuantumModels, which makes it hard to replace specific words during inference. In the future, we might want to store the ansatz in the model, and pass it the raw Diagrams. Replacing words (word boxes) on a diagram level is much easier than on a circuit level.

However, at the moment we leave it to the user to adapt the predefined models to their needs by creating sub classes.

3 replies

mspronesti Aug 23, 2022

Ok, got it, thanks a lot for your help!

Would you guys be interested in extending the documentation, explaining what we discussed in this conversation ? I can do it if you are interested in this kind of contribution. I believe understanding what happens during the training is crucial and, imho, the doc should contain these pieces of information.

It's true that usually, for software products, people don't go too much into the details of the implementation as the purpose is (usually) explaining how to use it and what to expect from it. However, as you pointed out, this is a research piece of software and people might be interested in understanding how it works from the inside. To me, understanding how it worked has been no trivial task: it took me reading the entire codebase and asking for your precious help.

Thommy257 Aug 23, 2022

Yes, I agree with you that this information should be provided in the docs. We are currently working on extending the documentation and I'll check where it fits in the most.

Of course we would welcome any kind of contribution. Especially for the documentation, as a developer it is sometimes hard to know what would help people the most :)

mspronesti Aug 24, 2022

I agree, as developers it is sometimes hard to know what would help other developers.

We are currently working on extending the documentation and I'll check where it fits in the most.

To me, it seems natural to put this kind of tutorial before the "training" section. For instance, you might name that section "parameters update" and you might follow (if you think it's useful) my though process in this discussion. Alternatively, I feel it might fit wherever you describe the Trainer subclasses, as it's their "fault" if these details are hidden to the user.

If I got time this weekend, I will try to contribute to the documentation, assuming you didn't update it before. Thanks again for you precious help!

How can a model be loaded from checkpoint if the weights take into account the validation set ? #42

Replies: 4 comments · 10 replies

Replies: 4 comments 10 replies