Skip to content

Commit

Permalink
Merge branch '1.3.x' of https://github.com/RasaHQ/rasa into mongodb-p…
Browse files Browse the repository at this point in the history
…atch
  • Loading branch information
Kevin Castro committed Sep 23, 2019
2 parents bcc71b5 + 22a7457 commit 61d2d17
Show file tree
Hide file tree
Showing 18 changed files with 386 additions and 67 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -114,4 +114,4 @@ jobs:
distributions: "sdist bdist_wheel"
password:
secure: "MeL1Ve97eBY+VbNWuQNuLzkPs0TPc+Zh8OfZkhw69ez5imsiWpvp0LrUOLVW3CcC0vNTANEBOVX/n1kHxfcqkf/cChNqAkZ6zTMmvR9zHDwQxXVGZ3jEQSQM+fHdQpjwtH7BwojyxaCIC/5iza7DFMcca/Q6Xr+atdTd0V8Q7Nc5jFHEQf3/4oIIm6YeCUiHcEu981LRdS04+jvuFUN0Ejy+KLukGVyIWyYDjjGjs880Mj4J1mgmCihvVkJ1ujB65rYBdTjls3JpP3eTk63+xH8aHilIuvqB8TDYih8ovE/Vv6YwLI+u2HoEHAtBD4Ez3r71Ju6JKJM7DhWb5aurN4M7K6DC8AvpUl+PsJbNP4ZeW2jXMH6lT6qXKVaSw7lhZ0XY3wunyVcAbArX4RS0B9pb1nHBYUBWZjxXtr8lhkpGFu7H43hw63Y19qb8z4+1cGnijgz1mqXSAssuc+3r0W0cSr+OsCjmOs7cwT6HMQvPEKxLohwBOS/I3EbuKQOYMjFN5BWP5JXbsG45awV9tquxEW8zxjMetR+AOcYoyrDeiR8sAnj1/F99DE0bL1KyW/G5VNu2Xi/c+0M3KvP3+F8XTCuUY/5zTvqh1Qz1jcdiwsiAhO4eBQzQnjeFlxdiVeue2kmD5qsh+VLKKuKLfyVoaV7b1kBlAtBDu7+hDpA="
after_deploy: bash scripts/ping_slack_about_package_release.sh
after_deploy: ./scripts/ping_slack_about_package_release.sh
55 changes: 54 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,22 @@ Fixed
-----
- re-added TLS, SRV dependencies for PyMongo

[1.3.4] - 2019-09-14
[1.3.6] - 2019-09-21
^^^^^^^^^^^^^^^^^^^^

Added
-----
- Added the ability for users to specify a conversation id to send a message to when
using the ``RasaChat`` input channel.

[1.3.5] - 2019-09-20
^^^^^^^^^^^^^^^^^^^^

Fixed
-----
- Fixed issue where ``rasa init`` would fail without spaCy being installed

[1.3.4] - 2019-09-20
^^^^^^^^^^^^^^^^^^^^

Added
Expand All @@ -22,6 +37,24 @@ Added
the ``SANIC_BACKLOG`` environment variable. This parameter sets the
number of unaccepted connections the server allows before refusing new
connections. A default value of 100 is used if the variable is not set.
- Status endpoint (``/status``) now also returns the number of training processes currently running

Fixed
-----
- Added the ability to properly deal with spaCy ``Doc``-objects created on
empty strings as discussed `here <https://github.com/RasaHQ/rasa/issues/4445>`_.
Only training samples that actually bear content are sent to ``self.nlp.pipe``
for every given attribute. Non-content-bearing samples are converted to empty
``Doc``-objects. The resulting lists are merged with their preserved order and
properly returned.

Changed
-------
- The endpoint ``POST /model/train`` no longer supports specifying an output directory
for the trained model using the field ``out``. Instead you can choose whether you
want to save the trained model in the default model directory (``models``)
(default behavior) or in a temporary directory by specifying the
``save_to_default_model_directory`` field in the training request.

[1.3.3] - 2019-09-13
^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -53,6 +86,7 @@ Changed
-------
- Pin gast to == 0.2.2


[1.3.0] - 2019-09-05
^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -140,6 +174,25 @@ Removed
-------
- Removed ``--report`` argument from ``rasa test nlu``. All output files are stored in the ``--out`` directory.


[1.2.9] - 2019-09-17
^^^^^^^^^^^^^^^^^^^^

Fixed
-----
- Correctly pass SSL flag values to x CLI command (backport of


[1.2.8] - 2019-09-10
^^^^^^^^^^^^^^^^^^^^

Fixed
-----
- SQL tracker events are retrieved ordered by timestamps. This fixes interactive
learning events being shown in the wrong order. Backport of ``1.3.2`` patch
(PR #4427).


[1.2.7] - 2019-09-02
^^^^^^^^^^^^^^^^^^^^

Expand Down
17 changes: 11 additions & 6 deletions docs/_static/spec/rasa.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ paths:
operationId: getStatus
tags:
- Server Information
summary: Status of the currently loaded Rasa model
summary: Status of the Rasa server
description: >-
Information about the currently loaded Rasa model.
Information about the server and the currently loaded Rasa model.
responses:
200:
description: Success
Expand All @@ -98,6 +98,10 @@ paths:
type: string
description: Path of the loaded model
example: 20190429-103105.tar.gz
num_active_training_jobs:
type: integer
description: Number of running training processes
example: 2
401:
$ref: '#/components/responses/401NotAuthenticated'
403:
Expand Down Expand Up @@ -1224,15 +1228,16 @@ components:
$ref: '#/components/schemas/NLUTrainingData'
stories:
$ref: '#/components/schemas/StoriesTrainingData'
out:
type: string
description: Output directory
example: models
force:
type: boolean
description: >-
Force a model training even if the data has not changed
example: false
save_to_default_model_directory:
type: boolean
description: >-
If `true` (default) the trained model will be saved in the default model
directory, if `false` it will be saved in a temporary directory
required: ["config"]

NLUTrainingData:
Expand Down
7 changes: 7 additions & 0 deletions docs/core/retrieval-actions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ You can cover all of these with a single story where the above intents are group
A retrieval action uses the output of a :ref:`response-selector` component from NLU which learns a
retrieval model to predict the correct response from a list of candidate responses given a user message text.


.. _retrieval-training-data:

Training Data
^^^^^^^^^^^^^

Expand Down Expand Up @@ -95,6 +98,10 @@ This is a key difference to the response templates in your domain file.
to the training process. The contents of it cannot be a part of the file which contains training data for other
components of NLU.

.. note::
As shown in the above examples, ``/`` symbol is reserved as a delimiter to separate retrieval intents from response text identifier. Make sure not to
use it in the name of your intents.

Config File
^^^^^^^^^^^

Expand Down
6 changes: 6 additions & 0 deletions docs/core/stories.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,12 @@ to predict the next action based on a *combination* of both the intent and
entities (you can, however, change this behavior using the
:ref:`use_entities <use_entities>` attribute).

.. warning::
``/`` symbol is reserved as a delimiter to separate retrieval intents from response text identifiers.
Refer to ``Training Data Format`` section of :ref:`retrieval-actions` for more details on this format.
If any of the intent names contain the delimiter, the file containing these stories will be considered as a training
file for :ref:`response-selector` model and will be ignored for training Core models.

Actions
~~~~~~~
While writing stories, you will encounter two types of actions: utterances
Expand Down
2 changes: 2 additions & 0 deletions docs/migration-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ General
an entity set, this will influence the weighted precision and f1-score quite a bit. From now on we
exclude ``no-entity`` from the evaluation. The overall metrics now only include proper entities. You
might see a drop in the performance scores when running the evaluation again.
- ``/`` is reserved as a delimiter token to distinguish between retrieval intent and the corresponding response text
identifier. Make sure you don't include ``/`` symbol in the name of your intents.

.. _migration-to-rasa-1.0:

Expand Down
4 changes: 4 additions & 0 deletions docs/nlu/training-data-format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,10 @@ Lookup tables may be specified either directly as lists or as txt files containi
.. note::
The common theme here is that common examples, regex features and lookup tables merely act as cues to the final NLU model by providing additional features to the machine learning algorithm during training. Therefore, it must not be assumed that having a single example would be enough for the model to robustly identify intents and/or entities across all variants of that example.

.. note::
``/`` symbol is reserved as a delimiter to separate retrieval intents from response text identifiers. Make sure not to
use it in the name of your intents.

JSON Format
-----------

Expand Down
3 changes: 3 additions & 0 deletions rasa/cli/x.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ def _rasa_service(
enable_api=True,
jwt_secret=args.jwt_secret,
jwt_method=args.jwt_method,
ssl_certificate=args.ssl_certificate,
ssl_keyfile=args.ssl_keyfile,
ssl_password=args.ssl_password,
)


Expand Down
48 changes: 38 additions & 10 deletions rasa/core/channels/rasa_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@

logger = logging.getLogger(__name__)

CONVERSATION_ID_KEY = "conversation_id"
JWT_USERNAME_KEY = "username"
INTERACTIVE_LEARNING_PERMISSION = "clientEvents:create"


class RasaChatInput(RestInput):
"""Chat input channel for Rasa X"""
Expand Down Expand Up @@ -88,15 +92,39 @@ async def _decode_bearer_token(self, bearer_token: Text) -> Optional[Dict]:
logger.exception("Failed to decode bearer token.")

async def _extract_sender(self, req: Request) -> Optional[Text]:
"""Fetch user from the Rasa X Admin API"""
"""Fetch user from the Rasa X Admin API."""

jwt_payload = None
if req.headers.get("Authorization"):
user = await self._decode_bearer_token(req.headers["Authorization"])
if user:
return user["username"]

user = await self._decode_bearer_token(req.args.get("token", default=None))
if user:
return user["username"]

abort(401)
jwt_payload = await self._decode_bearer_token(req.headers["Authorization"])

if not jwt_payload:
jwt_payload = await self._decode_bearer_token(req.args.get("token"))

if not jwt_payload:
abort(401)

if CONVERSATION_ID_KEY in req.json:
if self._has_user_permission_to_send_messages_to_conversation(
jwt_payload, req.json
):
return req.json[CONVERSATION_ID_KEY]
else:
logger.error(
"User '{}' does not have permissions to send messages to "
"conversation '{}'.".format(
jwt_payload[JWT_USERNAME_KEY], req.json[CONVERSATION_ID_KEY]
)
)
abort(401)

return jwt_payload[JWT_USERNAME_KEY]

@staticmethod
def _has_user_permission_to_send_messages_to_conversation(
jwt_payload: Dict, message: Dict
) -> bool:
user_scopes = jwt_payload.get("scopes", [])
return INTERACTIVE_LEARNING_PERMISSION in user_scopes or message[
CONVERSATION_ID_KEY
] == jwt_payload.get(JWT_USERNAME_KEY)
3 changes: 2 additions & 1 deletion rasa/nlu/training_data/loading.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,6 @@ def _load(filename: Text, language: Optional[Text] = "en") -> Optional["Training
if fformat == UNK:
raise ValueError("Unknown data format for file '{}'.".format(filename))

logger.debug("Training data format of '{}' is '{}'.".format(filename, fformat))
reader = _reader_factory(fformat)

if reader:
Expand Down Expand Up @@ -174,6 +173,8 @@ def guess_format(filename: Text) -> Text:
guess = fformat
break

logger.debug("Training data format of '{}' is '{}'.".format(filename, guess))

return guess


Expand Down
91 changes: 86 additions & 5 deletions rasa/nlu/utils/spacy_utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import logging
import typing
from typing import Any, Dict, List, Optional, Text
from typing import Any, Dict, List, Optional, Text, Tuple

from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig, override_defaults
Expand Down Expand Up @@ -129,18 +129,99 @@ def get_text(self, example, attribute):

return self.preprocess_text(example.get(attribute))

@staticmethod
def merge_content_lists(
indexed_training_samples: List[Tuple[int, Text]],
doc_lists: List[Tuple[int, "Doc"]],
) -> List[Tuple[int, "Doc"]]:
"""Merge lists with processed Docs back into their original order."""

dct = dict(indexed_training_samples)
dct.update(dict(doc_lists))
return sorted(dct.items())

@staticmethod
def filter_training_samples_by_content(
indexed_training_samples: List[Tuple[int, Text]]
) -> Tuple[List[Tuple[int, Text]], List[Tuple[int, Text]]]:
"""Separates empty training samples from content bearing ones."""

docs_to_pipe = list(
filter(
lambda training_sample: training_sample[1] != "",
indexed_training_samples,
)
)
empty_docs = list(
filter(
lambda training_sample: training_sample[1] == "",
indexed_training_samples,
)
)
return docs_to_pipe, empty_docs

def process_content_bearing_samples(
self, samples_to_pipe: List[Tuple[int, Text]]
) -> List[Tuple[int, "Doc"]]:
"""Sends content bearing training samples to spaCy's pipe."""

docs = [
(to_pipe_sample[0], doc)
for to_pipe_sample, doc in zip(
samples_to_pipe,
[
doc
for doc in self.nlp.pipe(
[txt for _, txt in samples_to_pipe], batch_size=50
)
],
)
]
return docs

def process_non_content_bearing_samples(
self, empty_samples: List[Tuple[int, Text]]
) -> List[Tuple[int, "Doc"]]:
"""Creates empty Doc-objects from zero-lengthed training samples strings."""

from spacy.tokens import Doc

n_docs = [
(empty_sample[0], doc)
for empty_sample, doc in zip(
empty_samples, [Doc(self.nlp.vocab) for doc in empty_samples]
)
]
return n_docs

def docs_for_training_data(
self, training_data: TrainingData
) -> Dict[Text, List[Any]]:

attribute_docs = {}
for attribute in SPACY_FEATURIZABLE_ATTRIBUTES:

texts = [self.get_text(e, attribute) for e in training_data.intent_examples]
# Index and freeze indices of the training samples for preserving the order
# after processing the data.
indexed_training_samples = [(idx, text) for idx, text in enumerate(texts)]

docs = [doc for doc in self.nlp.pipe(texts, batch_size=50)]
samples_to_pipe, empty_samples = self.filter_training_samples_by_content(
indexed_training_samples
)

content_bearing_docs = self.process_content_bearing_samples(samples_to_pipe)

non_content_bearing_docs = self.process_non_content_bearing_samples(
empty_samples
)

attribute_document_list = self.merge_content_lists(
indexed_training_samples,
content_bearing_docs + non_content_bearing_docs,
)

attribute_docs[attribute] = docs
# Since we only need the training samples strings, we create a list to get them out
# of the tuple.
attribute_docs[attribute] = [doc for _, doc in attribute_document_list]
return attribute_docs

def train(
Expand Down
Loading

0 comments on commit 61d2d17

Please sign in to comment.