Merge branch '1.3.x' of https://github.com/RasaHQ/rasa into mongodb-p…

…atch
ibabbar · Sep 23, 2019 · 61d2d17 · 61d2d17
2 parents bcc71b5 + 22a7457
commit 61d2d17
Show file tree

Hide file tree

Showing 18 changed files with 386 additions and 67 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -114,4 +114,4 @@ jobs:
       distributions: "sdist bdist_wheel"
       password:
         secure: "MeL1Ve97eBY+VbNWuQNuLzkPs0TPc+Zh8OfZkhw69ez5imsiWpvp0LrUOLVW3CcC0vNTANEBOVX/n1kHxfcqkf/cChNqAkZ6zTMmvR9zHDwQxXVGZ3jEQSQM+fHdQpjwtH7BwojyxaCIC/5iza7DFMcca/Q6Xr+atdTd0V8Q7Nc5jFHEQf3/4oIIm6YeCUiHcEu981LRdS04+jvuFUN0Ejy+KLukGVyIWyYDjjGjs880Mj4J1mgmCihvVkJ1ujB65rYBdTjls3JpP3eTk63+xH8aHilIuvqB8TDYih8ovE/Vv6YwLI+u2HoEHAtBD4Ez3r71Ju6JKJM7DhWb5aurN4M7K6DC8AvpUl+PsJbNP4ZeW2jXMH6lT6qXKVaSw7lhZ0XY3wunyVcAbArX4RS0B9pb1nHBYUBWZjxXtr8lhkpGFu7H43hw63Y19qb8z4+1cGnijgz1mqXSAssuc+3r0W0cSr+OsCjmOs7cwT6HMQvPEKxLohwBOS/I3EbuKQOYMjFN5BWP5JXbsG45awV9tquxEW8zxjMetR+AOcYoyrDeiR8sAnj1/F99DE0bL1KyW/G5VNu2Xi/c+0M3KvP3+F8XTCuUY/5zTvqh1Qz1jcdiwsiAhO4eBQzQnjeFlxdiVeue2kmD5qsh+VLKKuKLfyVoaV7b1kBlAtBDu7+hDpA="
-    after_deploy: bash scripts/ping_slack_about_package_release.sh
+    after_deploy: ./scripts/ping_slack_about_package_release.sh
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -13,7 +13,22 @@ Fixed
 -----
 - re-added TLS, SRV dependencies for PyMongo
 
-[1.3.4] - 2019-09-14
+[1.3.6] - 2019-09-21
+^^^^^^^^^^^^^^^^^^^^
+
+Added
+-----
+- Added the ability for users to specify a conversation id to send a message to when
+  using the ``RasaChat`` input channel.
+
+[1.3.5] - 2019-09-20
+^^^^^^^^^^^^^^^^^^^^
+
+Fixed
+-----
+- Fixed issue where ``rasa init`` would fail without spaCy being installed
+
+[1.3.4] - 2019-09-20
 ^^^^^^^^^^^^^^^^^^^^
 
 Added
@@ -22,6 +37,24 @@ Added
   the ``SANIC_BACKLOG`` environment variable. This parameter sets the
   number of unaccepted connections the server allows before refusing new
   connections. A default value of 100 is used if the variable is not set.
+- Status endpoint (``/status``) now also returns the number of training processes currently running
+
+Fixed
+-----
+- Added the ability to properly deal with spaCy ``Doc``-objects created on
+  empty strings as discussed `here <https://github.com/RasaHQ/rasa/issues/4445>`_.
+  Only training samples that actually bear content are sent to ``self.nlp.pipe``
+  for every given attribute. Non-content-bearing samples are converted to empty
+  ``Doc``-objects. The resulting lists are merged with their preserved order and
+  properly returned.
+
+Changed
+-------
+- The endpoint ``POST /model/train`` no longer supports specifying an output directory
+  for the trained model using the field ``out``. Instead you can choose whether you
+  want to save the trained model in the default model directory (``models``)
+  (default behavior) or in a temporary directory by specifying the
+  ``save_to_default_model_directory`` field in the training request.
 
 [1.3.3] - 2019-09-13
 ^^^^^^^^^^^^^^^^^^^^
@@ -53,6 +86,7 @@ Changed
 -------
 - Pin gast to == 0.2.2
 
+
 [1.3.0] - 2019-09-05
 ^^^^^^^^^^^^^^^^^^^^
 
@@ -140,6 +174,25 @@ Removed
 -------
 - Removed ``--report`` argument from ``rasa test nlu``. All output files are stored in the ``--out`` directory.
 
+
+[1.2.9] - 2019-09-17
+^^^^^^^^^^^^^^^^^^^^
+
+Fixed
+-----
+- Correctly pass SSL flag values to x CLI command (backport of
+
+
+[1.2.8] - 2019-09-10
+^^^^^^^^^^^^^^^^^^^^
+
+Fixed
+-----
+- SQL tracker events are retrieved ordered by timestamps. This fixes interactive
+  learning events being shown in the wrong order. Backport of ``1.3.2`` patch
+  (PR #4427).
+
+
 [1.2.7] - 2019-09-02
 ^^^^^^^^^^^^^^^^^^^^
 

diff --git a/docs/_static/spec/rasa.yml b/docs/_static/spec/rasa.yml
@@ -69,9 +69,9 @@ paths:
       operationId: getStatus
       tags:
       - Server Information
-      summary: Status of the currently loaded Rasa model
+      summary: Status of the Rasa server
       description: >-
-        Information about the currently loaded Rasa model.
+        Information about the server and the currently loaded Rasa model.
       responses:
         200:
           description: Success
@@ -98,6 +98,10 @@ paths:
                     type: string
                     description: Path of the loaded model
                     example: 20190429-103105.tar.gz
+                  num_active_training_jobs:
+                    type: integer
+                    description: Number of running training processes
+                    example: 2
         401:
           $ref: '#/components/responses/401NotAuthenticated'
         403:
@@ -1224,15 +1228,16 @@ components:
           $ref: '#/components/schemas/NLUTrainingData'
         stories:
           $ref: '#/components/schemas/StoriesTrainingData'
-        out:
-          type: string
-          description: Output directory
-          example: models
         force:
           type: boolean
           description: >-
             Force a model training even if the data has not changed
           example: false
+        save_to_default_model_directory:
+          type: boolean
+          description:  >-
+            If `true` (default) the trained model will be saved in the default model
+            directory, if `false` it will be saved in a temporary directory
       required: ["config"]
 
     NLUTrainingData:

diff --git a/docs/core/retrieval-actions.rst b/docs/core/retrieval-actions.rst
@@ -52,6 +52,9 @@ You can cover all of these with a single story where the above intents are group
 A retrieval action uses the output of a :ref:`response-selector` component from NLU which learns a
 retrieval model to predict the correct response from a list of candidate responses given a user message text.
 
+
+.. _retrieval-training-data:
+
 Training Data
 ^^^^^^^^^^^^^
 
@@ -95,6 +98,10 @@ This is a key difference to the response templates in your domain file.
     to the training process. The contents of it cannot be a part of the file which contains training data for other
     components of NLU.
 
+.. note::
+    As shown in the above examples, ``/`` symbol is reserved as a delimiter to separate retrieval intents from response text identifier. Make sure not to
+    use it in the name of your intents.
+
 Config File
 ^^^^^^^^^^^
 

diff --git a/docs/core/stories.rst b/docs/core/stories.rst
@@ -74,6 +74,12 @@ to predict the next action based on a *combination* of both the intent and
 entities (you can, however, change this behavior using the
 :ref:`use_entities <use_entities>` attribute).
 
+.. warning::
+    ``/`` symbol is reserved as a delimiter to separate retrieval intents from response text identifiers.
+    Refer to ``Training Data Format`` section of :ref:`retrieval-actions` for more details on this format.
+    If any of the intent names contain the delimiter, the file containing these stories will be considered as a training
+    file for :ref:`response-selector` model and will be ignored for training Core models.
+
 Actions
 ~~~~~~~
 While writing stories, you will encounter two types of actions: utterances

diff --git a/docs/migration-guide.rst b/docs/migration-guide.rst
@@ -50,6 +50,8 @@ General
   an entity set, this will influence the weighted precision and f1-score quite a bit. From now on we
   exclude ``no-entity`` from the evaluation. The overall metrics now only include proper entities. You
   might see a drop in the performance scores when running the evaluation again.
+- ``/`` is reserved as a delimiter token to distinguish between retrieval intent and the corresponding response text
+  identifier. Make sure you don't include ``/`` symbol in the name of your intents.
 
 .. _migration-to-rasa-1.0:
 

diff --git a/docs/nlu/training-data-format.rst b/docs/nlu/training-data-format.rst
@@ -72,6 +72,10 @@ Lookup tables may be specified either directly as lists or as txt files containi
 .. note::
     The common theme here is that common examples, regex features and lookup tables merely act as cues to the final NLU model by providing additional features to the machine learning algorithm during training. Therefore, it must not be assumed that having a single example would be enough for the model to robustly identify intents and/or entities across all variants of that example.
 
+.. note::
+    ``/`` symbol is reserved as a delimiter to separate retrieval intents from response text identifiers. Make sure not to
+    use it in the name of your intents.
+
 JSON Format
 -----------
 

diff --git a/rasa/cli/x.py b/rasa/cli/x.py
@@ -79,6 +79,9 @@ def _rasa_service(
         enable_api=True,
         jwt_secret=args.jwt_secret,
         jwt_method=args.jwt_method,
+        ssl_certificate=args.ssl_certificate,
+        ssl_keyfile=args.ssl_keyfile,
+        ssl_password=args.ssl_password,
     )
 
 

diff --git a/rasa/core/channels/rasa_chat.py b/rasa/core/channels/rasa_chat.py
@@ -13,6 +13,10 @@
 
 logger = logging.getLogger(__name__)
 
+CONVERSATION_ID_KEY = "conversation_id"
+JWT_USERNAME_KEY = "username"
+INTERACTIVE_LEARNING_PERMISSION = "clientEvents:create"
+
 
 class RasaChatInput(RestInput):
     """Chat input channel for Rasa X"""
@@ -88,15 +92,39 @@ async def _decode_bearer_token(self, bearer_token: Text) -> Optional[Dict]:
             logger.exception("Failed to decode bearer token.")
 
     async def _extract_sender(self, req: Request) -> Optional[Text]:
-        """Fetch user from the Rasa X Admin API"""
+        """Fetch user from the Rasa X Admin API."""
 
+        jwt_payload = None
         if req.headers.get("Authorization"):
-            user = await self._decode_bearer_token(req.headers["Authorization"])
-            if user:
-                return user["username"]
-
-        user = await self._decode_bearer_token(req.args.get("token", default=None))
-        if user:
-            return user["username"]
-
-        abort(401)
+            jwt_payload = await self._decode_bearer_token(req.headers["Authorization"])
+
+        if not jwt_payload:
+            jwt_payload = await self._decode_bearer_token(req.args.get("token"))
+
+        if not jwt_payload:
+            abort(401)
+
+        if CONVERSATION_ID_KEY in req.json:
+            if self._has_user_permission_to_send_messages_to_conversation(
+                jwt_payload, req.json
+            ):
+                return req.json[CONVERSATION_ID_KEY]
+            else:
+                logger.error(
+                    "User '{}' does not have permissions to send messages to "
+                    "conversation '{}'.".format(
+                        jwt_payload[JWT_USERNAME_KEY], req.json[CONVERSATION_ID_KEY]
+                    )
+                )
+                abort(401)
+
+        return jwt_payload[JWT_USERNAME_KEY]
+
+    @staticmethod
+    def _has_user_permission_to_send_messages_to_conversation(
+        jwt_payload: Dict, message: Dict
+    ) -> bool:
+        user_scopes = jwt_payload.get("scopes", [])
+        return INTERACTIVE_LEARNING_PERMISSION in user_scopes or message[
+            CONVERSATION_ID_KEY
+        ] == jwt_payload.get(JWT_USERNAME_KEY)
diff --git a/rasa/nlu/training_data/loading.py b/rasa/nlu/training_data/loading.py
@@ -132,7 +132,6 @@ def _load(filename: Text, language: Optional[Text] = "en") -> Optional["Training
     if fformat == UNK:
         raise ValueError("Unknown data format for file '{}'.".format(filename))
 
-    logger.debug("Training data format of '{}' is '{}'.".format(filename, fformat))
     reader = _reader_factory(fformat)
 
     if reader:
@@ -174,6 +173,8 @@ def guess_format(filename: Text) -> Text:
                 guess = fformat
                 break
 
+    logger.debug("Training data format of '{}' is '{}'.".format(filename, guess))
+
     return guess
 
 

diff --git a/rasa/nlu/utils/spacy_utils.py b/rasa/nlu/utils/spacy_utils.py
@@ -1,6 +1,6 @@
 import logging
 import typing
-from typing import Any, Dict, List, Optional, Text
+from typing import Any, Dict, List, Optional, Text, Tuple
 
 from rasa.nlu.components import Component
 from rasa.nlu.config import RasaNLUModelConfig, override_defaults
@@ -129,18 +129,99 @@ def get_text(self, example, attribute):
 
         return self.preprocess_text(example.get(attribute))
 
+    @staticmethod
+    def merge_content_lists(
+        indexed_training_samples: List[Tuple[int, Text]],
+        doc_lists: List[Tuple[int, "Doc"]],
+    ) -> List[Tuple[int, "Doc"]]:
+        """Merge lists with processed Docs back into their original order."""
+
+        dct = dict(indexed_training_samples)
+        dct.update(dict(doc_lists))
+        return sorted(dct.items())
+
+    @staticmethod
+    def filter_training_samples_by_content(
+        indexed_training_samples: List[Tuple[int, Text]]
+    ) -> Tuple[List[Tuple[int, Text]], List[Tuple[int, Text]]]:
+        """Separates empty training samples from content bearing ones."""
+
+        docs_to_pipe = list(
+            filter(
+                lambda training_sample: training_sample[1] != "",
+                indexed_training_samples,
+            )
+        )
+        empty_docs = list(
+            filter(
+                lambda training_sample: training_sample[1] == "",
+                indexed_training_samples,
+            )
+        )
+        return docs_to_pipe, empty_docs
+
+    def process_content_bearing_samples(
+        self, samples_to_pipe: List[Tuple[int, Text]]
+    ) -> List[Tuple[int, "Doc"]]:
+        """Sends content bearing training samples to spaCy's pipe."""
+
+        docs = [
+            (to_pipe_sample[0], doc)
+            for to_pipe_sample, doc in zip(
+                samples_to_pipe,
+                [
+                    doc
+                    for doc in self.nlp.pipe(
+                        [txt for _, txt in samples_to_pipe], batch_size=50
+                    )
+                ],
+            )
+        ]
+        return docs
+
+    def process_non_content_bearing_samples(
+        self, empty_samples: List[Tuple[int, Text]]
+    ) -> List[Tuple[int, "Doc"]]:
+        """Creates empty Doc-objects from zero-lengthed training samples strings."""
+
+        from spacy.tokens import Doc
+
+        n_docs = [
+            (empty_sample[0], doc)
+            for empty_sample, doc in zip(
+                empty_samples, [Doc(self.nlp.vocab) for doc in empty_samples]
+            )
+        ]
+        return n_docs
+
     def docs_for_training_data(
         self, training_data: TrainingData
     ) -> Dict[Text, List[Any]]:
-
         attribute_docs = {}
         for attribute in SPACY_FEATURIZABLE_ATTRIBUTES:
-
             texts = [self.get_text(e, attribute) for e in training_data.intent_examples]
+            # Index and freeze indices of the training samples for preserving the order
+            # after processing the data.
+            indexed_training_samples = [(idx, text) for idx, text in enumerate(texts)]
 
-            docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
+            samples_to_pipe, empty_samples = self.filter_training_samples_by_content(
+                indexed_training_samples
+            )
+
+            content_bearing_docs = self.process_content_bearing_samples(samples_to_pipe)
+
+            non_content_bearing_docs = self.process_non_content_bearing_samples(
+                empty_samples
+            )
+
+            attribute_document_list = self.merge_content_lists(
+                indexed_training_samples,
+                content_bearing_docs + non_content_bearing_docs,
+            )
 
-            attribute_docs[attribute] = docs
+            # Since we only need the training samples strings, we create a list to get them out
+            # of the tuple.
+            attribute_docs[attribute] = [doc for _, doc in attribute_document_list]
         return attribute_docs
 
     def train(