From fa6d651be74b6244eb235a43a9b0d006d057dcac Mon Sep 17 00:00:00 2001 From: Eimi Okuno Date: Mon, 3 Aug 2020 14:41:54 +0100 Subject: [PATCH] I240 (#242) * General automated clean up * Updating feature list * Updated Readme * Updated docs for Adapters, Analytics and renaming misspellings. --- README.md | 192 +++++++++--------- docs/features-list.md | 126 +++++++----- docs/guides/adapters.md | 165 +++++++++------ docs/guides/analytics.md | 36 ++-- ...> draftjs-blocks-entityrange-entitymap.md} | 51 ++--- docs/guides/storybook-npm-setup.md | 46 ++--- 6 files changed, 340 insertions(+), 276 deletions(-) rename docs/guides/{draftjs-blocks-entityrange-entitmap.md => draftjs-blocks-entityrange-entitymap.md} (80%) diff --git a/README.md b/README.md index e574a798..32519252 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,5 @@ # React Transcript Editor - - A React component to make transcribing audio and video easier and faster.

@@ -16,9 +14,7 @@ A React component to make transcribing audio and video easier and faster.


- The project uses [this github project boards to organise and the co-ordinate development](https://github.com/bbc/react-transcript-editor/projects). - _--> Work in progress <--_ @@ -35,34 +31,31 @@ _--> Work in progress <--_ Node version is set in node version manager [`.nvmrc`](https://github.com/creationix/nvm#nvmrc) - - - - ## Setup - - - -Fork this repository + git clone + cd into folder +1. Fork this repository +2. Clone this repository to a directory of your choice +3. Run `npm i` to install dependencies ## Usage - development -Git clone this repository and cd into the folder. +We use a tool called [`storybook`](https://storybook.js.org) +to run the components locally. To start the Storybook, run: -To start the storybook run - -``` +```sh npm start ``` -Visit [http://localhost:6006](http://localhost:6006) +Running that command should open the locally hosted Storybook, but if it doesn't, +visit [http://localhost:6006](http://localhost:6006) ## Usage - production -Available on [npm - `@bbc/react-transcript-editor`](https://www.npmjs.com/package/@bbc/react-transcript-editor) +In order to use a published version of `react-transcript-editor`, +install the published module [`@bbc/react-transcript-editor`](https://www.npmjs.com/package/@bbc/react-transcript-editor) +by running: -``` +```sh npm install @bbc/react-transcript-editor ``` @@ -70,7 +63,7 @@ npm install @bbc/react-transcript-editor import TranscriptEditor from "@bbc/react-transcript-editor"; ``` -Minimal data needed for initialization +### Basic use case ```js ``` -With more attributes +`transcriptData` and `mediaUrl` are non-optional props to use `TranscriptEditor`. +See the full list of options [here](#transcripteditor-props-list). + +### Advanced use case + ```js ``` -| Attributes | Description | required | type | -| :-------------------- | :---------------------------------------------------------------------------------------------------------------------- | :------: | :-------: | -| transcriptData | Transcript json | yes | Json | -| mediaUrl | string url to media file - audio or video | yes | String | -|`handleAutoSaveChanges`| returns content of transcription after a change | no | Function | -| autoSaveContentType | specify the file format for data returned by `handleAutoSaveChanges`,falls back on `sttJsonType`. or `draftjs` | no | string | -| isEditable | set to true if you want to be able to edit the text | no | Boolean | -| spellCheck | set to true if you want the browser to spell check this transcript | no | Boolean | -|`handleAnalyticsEvents`| if you want to collect analytics events. | no | Function | -| fileName | used for saving and retrieving local storage blob files | no | String | -| title | defaults to empty string | no | String | -| ref | if you want to have access to internal functions such as retrieving content from the editor. eg to save to a server/db. | no | React ref | -| mediaType | can be `audio` or `video`, if not provided the component uses the url file type to determine and adjust use of the page layout | no | String | +### TranscriptEditor Props List + +| Props | Description | required | type | default | +| :---------------------- | :---------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------: | :-------: | :----------------------------------------------------------------------------: | +| `transcriptData` | Transcript JSON | yes | JSON | | +| `mediaUrl` | URL to media (audio or video) file | yes | String | | +| `handleAutoSaveChanges` | Function to handle the content of transcription after a change | no | Function | +| `autoSaveContentType` | Specify the file format for data returned by `handleAutoSaveChanges` | no | String | falls back to `sttJsonType`, if set, or `draftjs` | +| `isEditable` | Set to `true` to have the ability to edit the text | no | Boolean | False | +| `spellCheck` | Set to `true` to spell-check the transcript | no | Boolean | False | +| `sttJsonType` | The data model type of your `transcriptData` | no | String | `draftjs` | +| `handleAnalyticsEvents` | if you want to collect analytics events. | no | Function | false | +| `fileName` | used for saving and retrieving local storage blob files | no, but disables the [local save feature](#local-save) | String | +| `title` | defaults to empty string | no | String | +| `ref` | If you want to have access to internal functions such as retrieving content from the editor. eg to save to a server/db. | no | React ref | +| `mediaType` | Can be `audio` or `video`. Changes the look of the UI based on media type. | no | String | if not provided the component uses the `medialUrl` to determine the media type | See [`./demo/app.js` demo](./demo/app.js) as a more detailed example usage of the component. -_Note: `fileName` it is optional but it's needed if working with user uploaded local media in the browser, to be able to save and retrieve from local storage. For instance if you are passing a blob url to `mediaUrl` using `createObjectURL` this url is randomly re-generated on every page refresh so you wouldn't be able to restore a session, as `mediaUrl` is used as the local storage key. See demo app for more detail example of this[`./src/index.js`](./src/index.js)_ +#### Local save -_Note: `mediaType` if not defined, the component uses the url to determine the type and adjust the layout accordingly, however this could result in a slight delay when loading the component as it needs to fetch the media to determine it's file type_ +`fileName` is optional but it's needed if working with user uploaded local media in the browser, +to be able to save and retrieve from local storage. +For instance if you are passing a blob url to `mediaUrl` using `createObjectURL` this url is randomly re-generated on every page refresh so you wouldn't be able to restore a session, as `mediaUrl` is used as the local storage key. See demo app for more detail example of this[`./src/index.js`](./src/index.js)\_ ### Typescript projects @@ -128,7 +130,10 @@ import { TranscriptEditor } from "@bbc/react-transcript-editor"; #### Internal components +##### Direct imports + You can also import some of the underlying React components directly. +See [the storybook](https://bbc.github.io/react-transcript-editor) for each component details on optional and required attributes. - `TranscriptEditor` - `TimedTextEditor` @@ -136,7 +141,6 @@ You can also import some of the underlying React components directly. - `VideoPlayer` - `Settings` - `KeyboardShortcuts` - - `ProgressBar` - `PlaybackRate` - `PlayerControls` @@ -153,144 +157,142 @@ import TimedTextEditor from "@bbc/react-transcript-editor/TimedTextEditor"; import { TimedTextEditor } from "@bbc/react-transcript-editor"; ``` -However if you are not using `TranscriptEditor` it is recommended to follow the second option and import individual components like: `@bbc/react-transcript-editor/TimedTextEditor` rather than the entire library. Doing so pulls in only the specific components that you use, which can significantly reduce the amount of code you end up sending to the client. (Similarly to how [`react-bootstrap`](https://react-bootstrap.github.io/getting-started/introduction) works) +##### Import recommendation -See [the storybook](https://bbc.github.io/react-transcript-editor) for each component details on optional and required attributes. +However if you are not using `TranscriptEditor` it is recommended to follow the second option and import individual components like: `@bbc/react-transcript-editor/TimedTextEditor` rather than the entire library. +Doing so pulls in only the specific components that you use, which can significantly reduce the amount of code you end up sending to the client. (Similarly to how [`react-bootstrap`](https://react-bootstrap.github.io/getting-started/introduction) works) + +#### Other Node Modules (non-react) + +Some of these node modules can be used as standalone imports. -You can also use this node modules as standalone +##### Export Adapter + +Converts from draftJs json format to other formats ```js import exportAdapter from "@bbc/react-transcript-editor/exportAdapter"; ``` -Converts from draftJs json format to other formats +##### STT JSON Adapter + +Converts various stt json formats to draftJs ```js import sttJsonAdapter from "@bbc/react-transcript-editor/sttJsonAdapter"; ``` -Converts various stt json formats to draftJs +##### Conversion modules to/from Timecodes + +Some modules to convert to and from timecodes ```js import { secondsToTimecode, timecodeToSeconds, - shortTimecode + shortTimecode, } from "@bbc/react-transcript-editor/timecodeConverter"; ``` -some modules to convert to and from timecodes - ## System Architecture - - -- uses [`storybook`](https://storybook.js.org) with the setup as [explained in their docs](https://storybook.js.org/docs/guides/guide-react/) to develop this React. +- Uses [`storybook`](https://storybook.js.org) with the setup as [explained in their docs](https://storybook.js.org/docs/guides/guide-react/) to develop this React. - This uses [CSS Modules](https://github.com/css-modules/css-modules) to contain the scope of the css for this component. -- [`.storybook/webpack.config.js](./.storybook/webpack.config.js) enanches the storybook webpack config to add support for css modules. +- [`.storybook/webpack.config.js](./.storybook/webpack.config.js) enables the storybook webpack config to add support for css modules. - The parts of the component are inside [`./packages`](./packages) - [babel.config.js](./babel.config.js) provides root level system config for [babel 7](https://babeljs.io/docs/en/next/config-files#project-wide-configuration). - - ## Documentation -There's a [docs](./docs) folder in this repository. +There's a [docs](./docs) folder in this repository, which contains subdirectories to keep: -[docs/notes](./docs/notes) contains dev notes on various aspects of the project. +- [notes](./docs/notes): dev notes on various aspects of the project. +- [adr](./docs/adr): [Architecture Decision Record](https://github.com/joelparkerhenderson/architecture_decision_record). -[docs/adr](./docs/adr) contains [Architecture Decision Record](https://github.com/joelparkerhenderson/architecture_decision_record). +### ADR > An architectural decision record (ADR) is a document that captures an important architectural decision made along with its context and consequences. We are using [this template for ADR](https://gist.github.com/iaincollins/92923cc2c309c2751aea6f1b34b31d95) +### QA + [There also QA testing docs](./docs/qa/README.md) to manual test the component before a major release, (QA testing does not require any technical knowledge). ## Build - - > To transpile `./packages` and create a build in the `./dist` folder, run: -``` +```sh npm run build:component ``` -## Demo & storybook +To understand the build process, have a read through [this](./docs/guides/storybook-npm-setup.md). -- **Storybook** can bew viewed at [https://bbc.github.io/react-transcript-editor/](https://bbc.github.io/react-transcript-editor/) +## Demo & storybook +- **Storybook** can be viewed at [https://bbc.github.io/react-transcript-editor/](https://bbc.github.io/react-transcript-editor/) - **Demo** can be viewed at [https://bbc.github.io/react-transcript-editor/iframe.html?id=demo--default](https://bbc.github.io/react-transcript-editor/iframe.html?id=demo--default) -http://localhost:6006 - - +To run locally, see [setup](#usage---development). -## Build - storybook +### Build - storybook -To build the storybook as a static site +To build the storybook as a static site, run: -``` +```sh npm run build:storybook ``` -## Publish storybook & demo to github pages +This will produce a `build` folder containing the static site of the demo. +To serve the `build` folder locally, run: -This github repository uses [github pages](https://pages.github.com/) to host the storybook and the demo of the component - -``` -npm run deploy:ghpages +```sh +npm run build:storybook:serve ``` -add to git, and push to origin master to update +#### Publishing to a web page - +##### Github Pages -Alternatively If you simply want to build the demo locally in the `build` folder then just +We use [github pages](https://pages.github.com/) to host the storybook and the [demo](https://help.github.com/articles/user-organization-and-project-pages/#project-pages-sites) of the component. +Make sure to add your changes to git, and push to `origin master` to ensure the code in `master` is reflective of what's online on `Github Pages`. +When you are ready, re-publish the Storybook by running: -``` -npm run build:storybook -``` - -you can then run this command to serve the static site locally - -``` -npm run build:storybook:serve +```sh +npm run deploy:ghpages ``` ## Tests - - -Test coverage using [`jest`](https://jestjs.io/), to run tests +We are using [`jest`](https://jestjs.io/) for the testing framework. +To run tests, run: ```sh npm run test ``` -During development you can use +For convenience, during development you can use: ```sh npm run test:watch ``` -## Travis CI +and watch the test be re-run at every save. -On commit this repo uses the [.travis.yml](./.travis.yml) config tu run the automated test on [travis CI](https://travis-ci.org/bbc/react-transcript-editor). +## Travis CI -## Deployment +On commit this repo uses the [.travis.yml](./.travis.yml) config to run the automated test on [travis CI](https://travis-ci.org/bbc/react-transcript-editor). - +## Publish to NPM -To push to [npm - `@bbc/react-transcript-editor`](https://www.npmjs.com/package/@bbc/react-transcript-editor) +To publish to [npm - `@bbc/react-transcript-editor`](https://www.npmjs.com/package/@bbc/react-transcript-editor) run: -``` +```sh npm publish:public ``` -This runs `npm run build:component` and `npm publish --access public` under the hood +This runs `npm run build:component` and `npm publish --access public` under the hood, building the component and publishing to NPM. > Note that only `README.md` and the `dist` folders are published to npm. diff --git a/docs/features-list.md b/docs/features-list.md index fc205146..d013aba8 100644 --- a/docs/features-list.md +++ b/docs/features-list.md @@ -1,64 +1,87 @@ -# Features List - draft +# Features List -## User Facing +## User Interface -Player controls -- [x] play/pause -- [x] Current time + duration display -- [X] Adjust Playback rate -- [x] Jump to timecode <— in timecode `hh:mm:ss:ms` format or (hh:mm:ss:ms hh:mm:ss mm:ss m:ss m.ss seconds) -- [x] Adjust timecodes <— set a timecode offset - default to zero -- [x] [auto pause while typing](https://github.com/bbc/react-transcript-editor/issues/19) <-- - - -- [x] Roll back button 15 sec default, customizable amount +### Player controls -Keyboard Shortcuts -- [X] Keyboard Shortcuts -- [ ] customizable Keyboard Shortcuts - -HyperTranscript - interactivity -- [x] On text word double click at timecode -> media current time set to word timecode -- [x] [Words highlighted at current time](https://github.com/bbc/react-transcript-editor/issues/25) <— -- [x] [Scroll Sync](https://github.com/bbc/react-transcript-editor/issues/34), keep current word in view <— (toggle on/off) -- [ ] Preserve timecodes while editing - TBC how +- [x] play/pause +- [x] Current time + duration display +- [x] Adjust Playback rate +- [x] Jump to timecode + - in timecode `hh:mm:ss:ms` format or `hh:mm:ss:ms hh:mm:ss mm:ss m:ss m.ss seconds`) +- [x] Adjust timecodes + - configurable in the settings menu + - default to `00:00:00:00` +- [x] Roll back button + - configurable in the settings menu + - default to 15 seconds +- [x] [auto pause while typing](https://github.com/bbc/react-transcript-editor/issues/19) (adjustable in the settings menu - default to `on`) + + +### Keyboard Shortcuts + +- [x] Keyboard Shortcuts +- [ ] customizable Keyboard Shortcuts + +### HyperTranscript - interactivity + +- [x] Set media time by double clicking on a word +- [x] [Words highlighted at current time](https://github.com/bbc/react-transcript-editor/issues/25) <— +- [x] [Scroll Sync](https://github.com/bbc/react-transcript-editor/issues/34) + - Keep current word in view + - default `off` + - configurable in the settings menu +- [ ] Preserve timecodes while editing + +### Transcript Extra Info -Transcript Extra Info - [x] [Display Timecodes at paragraph level](https://github.com/bbc/react-transcript-editor/issues/26) (with offset if present) -- [x] [Display editable speaker names at paragraph level](https://github.com/bbc/react-transcript-editor/issues/26) - speaker diarization info +- [x] [Display editable speaker names at paragraph level](https://github.com/bbc/react-transcript-editor/issues/26) - speaker diarization info + +### Save + +- [x] Save locally to local storage + - [x] On interval, e.g. every `x` char + +### Export (for proof of concept) + +- [x] Export plain text + - [x] without speaker names or timecodes + - [x] with speaker names and timecodes +- [x] Captions + - [x] JSON + - [x] CSV + - [x] Adobe Premiere + - [x] SRT + - [x] TTML + - [x] VTT + - [x] [Digital Paper Edit](https://www.github.com/bbc/digital-paper-edit-client) +- [ ] Customizable Export plain text, eg with timecodes, speakers names etc.. + - [ ] with speaker names + - [ ] with timecodes + - [ ] with timecodes & speaker names -Save -- [x] Save locally - (local storage) -- [x] Save locally - on interval, eg every `x` char -- [ ] ~Save to server API end point - Btc~ -- [ ] ~Save to server API end point - on interval~ +### Mobile First -Export <-- for proof of concept -- [X] Export plain text - without speaker names or timecodes -- [ ] Customizable Export plain text, eg with timecodes, speakers names etc.. - - [ ] with speaker names - - [ ] with timecodes - - [ ] with timecodes & speaker names -- [ ] Other? +- [x] Works on mobile +### Browser compatibility -Mobile First -- [x] Works on mobile +- [x] Chrome +- [ ] Firefox +- [ ] Internet Explorer -Browser compatibility -- [X] Works on Chrome -- [ ] Windows Explorer IE +## Dev -## Dev +### Import Transcript Json - Adapters -Import Transcript Json - Adapters -- [x] BBC Kaldi +- [x] BBC Kaldi - [x] News Labs API - BBC Kaldi - [x] autoEdit 2 - [x] AWS Transcriber - [x] IBM Watson STT -- [X] Speechmatics -- [ ] Gentle Transcription +- [x] Speechmatics +- [ ] Gentle Transcription - [ ] Gentle Alignment Json - [ ] AssemblyAI - [ ] Rev @@ -67,15 +90,15 @@ Import Transcript Json - Adapters - [ ] TTML - [ ] VTT - [ ] VTT (Youtube) -- [ ] IIIF +- [ ] IIIF - [ ] SMT and/or CTM ? +### Export Transcript Json - Adapters -Export Transcript Json - Adapters -- [ ] BBC Kaldi +- [ ] BBC Kaldi - [ ] News Labs API - BBC Kaldi - [ ] autoEdit 2 -- [ ] Gentle Transcription +- [ ] Gentle Transcription - [ ] Gentle Alignment Json - [ ] IBM Watson STT - [ ] Speechmatics @@ -86,8 +109,7 @@ Export Transcript Json - Adapters - [ ] TTML - [ ] VTT - [ ] VTT (Youtube) -- [ ] IIIF +- [ ] IIIF - [ ] SMT and/or CTM ? - - +You can add adapters - [see guide](./guides/adapters.md). diff --git a/docs/guides/adapters.md b/docs/guides/adapters.md index 714e46ff..558231e7 100644 --- a/docs/guides/adapters.md +++ b/docs/guides/adapters.md @@ -5,120 +5,161 @@ _this is a draft. we'd like this guide to be relatively easy to read for newcome Adapters are used to enable the `TranscriptEditor` component to convert various STT transcripts into a format draftJS can understand to provide data for the `TimedTextEditor`. ## How to create a new adapter + If you want to create a new adapter for a new STT service that is not yet supported by the component, we welcome [PRs](https://help.github.com/articles/about-pull-requests/). -[Feel free to begin by raising an issue](https://github.com/bbc/react-transcript-editor/issues/new?template=feature_request.md) so that others can be aware that there is active development for that specific STT service, and if needed we can synchronies the effort. +1. [Begin by raising an issue (optional)](https://github.com/bbc/react-transcript-editor/issues/new?template=feature_request.md) so that others can be aware that there is active development for that specific STT service, and if needed we can synchronize the effort. +2. [Fork the repo](https://help.github.com/articles/fork-a-repo/) and + create a branch with the name of the STT service, eg `stt-adapter-speechmatics`. -[Fork the repo](https://help.github.com/articles/fork-a-repo/) and -create a branch with the name of the stt service, eg `stt-adapter-speechmatics`. +### Context - How does an Adapter work - +To develop a new adapter, it's best to understand what the adapters are doing. -## Context +When we call `sttJsonAdapter` with `transcriptData` and a `sttJsonType`, we expect it to return an object with two attributes `blocks` and `entityMap`. -To see this in the larger context when we call `sttJsonAdapter` with `transcriptData` and a `sttJsonType` we expect it to return an object with two attributes `blocks` and `entityMap`. +The output is then used in `TimedTextEditor` with the Draft JS function [convertFromRaw](https://draftjs.org/docs/api-reference-data-conversion#convertfromraw) to create a new content state for the editor. -This is then used within TimedTextEditor with the help of draftJs function [convertFromRaw](https://draftjs.org/docs/api-reference-data-conversion#convertfromraw) to create a new content state for the editor. +```js +const draftJSCompatibleJSON = sttJsonAdapter(transcriptData, sttJsonType); -So in order to convert a json from STT from service to draftJs json we need to create: -- a data [block](https://draftjs.org/docs/api-reference-content-block#docsNav) -- [entityRanges](https://draftjs.org/docs/advanced-topics-entities) -- `entityMap` +/* draftJSCompatibleJSON: + { + "blocks": [], + "entityMap": {} + } + */ -Note that `entityMap` and `entityRanges` will get generated programmatically by dedicated functions. +/* Inside TimedTextEditor */ +const contentState = convertFromRaw(draftJSCompatibleJSON); +``` -checkout [a quick side note on how the DraftJS `block`, `entityRanges` and `entityMap` works, in the context of the TranscriptEditor component](./draftjs-blocks-entityrange-entitmap.md). Or feel free to skip this and come back later to it, if you are not interested in the underlying implementation. +So in order to convert an STT JSON to Draft JS JSON we need to made sure our adapter generates: -## Steps +- a data [block](https://draftjs.org/docs/api-reference-content-block#docsNav) +- [entityRanges](https://draftjs.org/docs/advanced-topics-entities) +- `entityMap` -In your branch +**Note**: `entityMap` and `entityRanges` will be generated programmatically by dedicated functions. -- [ ] Create a folder with the name of the STT service - eg `speechmatics` -- [ ] add a `src/lib/Util/adapters/${sttServiceName}/sample` folder -- [ ] add a sample json file from the STT service in this last folder - this will be useful for testing. Name it `${name of the stt service}.sample.json` - -- [ ] add option in [adapters/index.js](adapters/index.js) +#### Blocks, entityRanges and entityMap -In the adapters [adapters/index.js](adapters/index.js) in the `sttJsonAdapter` function switch statement add a new `case` with the new STT service type eg `speechmatics` +Checkout [a doc on how the DraftJS `block`, `entityRanges` and `entityMap` works, in the context of the TranscriptEditor component](./draftjs-blocks-entityrange-entitymap.md). Or feel free to skip this and come back later to it, if you are not interested in the underlying implementation. - -```js -import speechmaticsToDraft from './speechmatics/index'; +### File and function structure -... +This is the recommended structure for keeping consistency across multiple adapter libraries. Here `${sttServiceName}` is the STT service name that you want to build the adapter for, such as `speechmatics`. Make sure that it's not capitalized. -case 'speechmatics': - blocks = speechmaticsToDraft(transcriptData); - return { blocks, entityMap: createEntityMap(blocks) }; -``` +| file structure | +| ------------------------------------------------------------------------------ | +| `packages/stt-adapters/${sttServiceName}/index.js` | +| `packages/stt-adapters/${sttServiceName}/sample` | +| `packages/stt-adapters/${sttServiceName}/sample/${sttServiceName}.sample.json` | -- [ ] add an adapter function. +| function name | description | +| -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `${sttServiceName}ToDraft` | the single exported function in your `${sttServiceName}/index.js`. Converts and outputs necessary data to be used in `convertFromRaw`. See [the recommended approach](#approach-for-converting-stt-to-draftjs-blocks-and-entityranges) | -as shown in the example you'd also need to add a function with the stt provider name +`ToDraft` eg `speechmaticsToDraft`that takes in the transcript data. +## Steps + +In your branch: -- [ ] create a function to convert the STT data structure into draftJs blocks and entityRanges. +1. Follow the [recommended file structure as above](#file-and-function-structure) and create the folders and files necessary. + 1. [ ] Create a folder with the name of the STT service + 2. [ ] add a `sample` folder inside your `${sttServiceName}` folder + 3. [ ] add a sample json file to the `sample` folder: `${sttServiceName}.sample.json`. This will be used for [testing](#tests). + 4. Check this JSON file is excluded from the build bundle +2. [ ] [add the new adapter and function to the STT Adapter package](#add-adapter-to-stt-adapter-package) +3. [ ] create a function that [convert the STT data structure into draftJs blocks and entityRanges](#approach-for-converting-stt-to-draftjs-blocks-and-entityranges). + 1. You can see examples from `bbc-kaldi` and `autoEdit2` adapters. -You can see examples from `bbc-kaldi` and `autoEdit2` adapters. +### Approach for converting STT to `DraftJS` `blocks` and `entityRanges` -In pseudocode it's reccomended to follow this approach: +In pseudocode it's recommended to follow this approach: -1. Expose one function call that takes in the stt json data +1. Expose one function call that takes in the STT json data 2. Have a helper function `groupWordsInParagraphs` that as the name suggests groups words list from the STT provider transcript based on punctuation. and returns an array of words objects. + 1. The underlying details for this will vary depending on how the STT json of the provider present the data, and how the attributes are named etc.. +3. [Iterate over the paragraphs to create draftJS content blocks](#iterate-over-the-paragraphs-to-create-draftjs-content-blocks) (see `bbc-kaldi` and `autoEdit2` example). +4. And use the helper function `generateEntitiesRanges` to add the `entityRanges` to each block. - see above +5. If you have speaker diarization info you can also add this to the block info - _optional_ + +### Add adapter to STT Adapter Package -The underlying details for this will vary depending on how the STT json of the provider present the data, and how the attributes are named etc.. +1. [ ] Import the adapters converting function `${sttServiceName}ToDraft` + 1. This function should follow the [recommended approach to convert the data](#approach-for-converting-stt-to-draftjs-blocks-and-entityranges). +2. [ ] Add case to `sttJsonAdapter` + 1. There is an `sttJsonAdapter` function inside the [`adapters`](../../packages/stt-adapters/index.js). + In this `switch/case` function, add a new `case` with the new STT service type e.g. `speechmatics` -3. Iterate over the paragraphs to create draftJS content blocks (see `bbc-kaldi` and `autoEdit2` example). + + +For example, if you were adding a `speechmatics` adapter, the code should now look like this: ```js -wordsByParagraphs.forEach((paragraph, i) => { - const draftJsContentBlockParagraph = { - text: paragraph.text.join(' '), - type: 'paragraph', - data: { - speaker: `TBC ${ i }`, - words: paragraph.words, // - start: paragraph.words[0].start// - }, - // the entities as ranges are each word in the space-joined text, - // so it needs to be compute for each the offset from the beginning of the paragraph and the length - entityRanges: generateEntitiesRanges(paragraph.words, 'text'), // wordAttributeName - }; - // console.log(JSON.stringify(draftJsContentBlockParagraph,null,2)) - results.push(draftJsContentBlockParagraph); - }); +import speechmaticsToDraft from './speechmatics'; + +... +case 'speechmatics': + blocks = speechmaticsToDraft(transcriptData); + return { blocks, entityMap: createEntityMap(blocks) }; ``` -4. And use the helper function `generateEntitiesRanges` to add the `entityRanges` to each block. - see above +### Iterate over the paragraphs to create draftJS content blocks -5. If you have speaker diarization info you can also add this to the block info - _optional_ +```js +wordsByParagraphs.forEach((paragraph, i) => { + const draftJsContentBlockParagraph = { + text: paragraph.text.join(" "), + type: "paragraph", + data: { + speaker: `TBC ${i}`, + words: paragraph.words, // + start: paragraph.words[0].start, // + }, + // the entities as ranges are each word in the space-joined text, + // so it needs to be compute for each the offset from the beginning of the paragraph and the length + entityRanges: generateEntitiesRanges(paragraph.words, "text"), // wordAttributeName + }; + // console.log(JSON.stringify(draftJsContentBlockParagraph,null,2)) + results.push(draftJsContentBlockParagraph); +}); +``` ## Speaker Labels -If the speech to text returns speaker label (speaker diarization info) you can either use that in the adapter, to associated that info at a paragraph level. Or leave it out for a later implementation. +If the speech-to-text returns speaker label (speaker diarization info) you can either use that in the adapter, to associated that info at a paragraph level. Or leave it out for later implementation. If you decide to incorporate speaker labels, then change the `TBC` to `Speaker ${speakerLabelFromSTTprovider}`, -We have been (informally) using `TBC` as a place holder when the STT service doesn't return any speaker diarization info, and/or when it has not been incorporated in the adapter yet. +We have been informally using `TBC` as a placeholder when the STT service doesn't return any speaker diarization info, and/or when it has not been incorporated in the adapter yet. ## Tests -This project uses jest. and once you submit the PR the tests are run by TravisCI. It is recommended to write some basic tests at a minimum so that you can see at a glance if the adapter is working as expected. +This project uses Jest. and once you submit the PR the tests are run by TravisCI. It is recommended to write some basic tests at a minimum so that you can see at a glance if the adapter is working as expected. -In order to write your tests, you want to have a `sample` folder with transcript data from stt and expected draftJs data output with file extensions `.sample.json` and `.sample.js` - see `bbc-kaldi` and `autoEdit2` example. This is so that those stub/example files are not bundled with the component when packaging for npm. +In order to write your tests, you want to have a `sample` folder with transcript data from STT and expected draftJs data output with file extensions `.sample.json` and `.sample.js` - see `bbc-kaldi` and `autoEdit2` example. This is so that those stub/example files are not bundled with the component when packaging for NPM. You can create and run your `example-usage.js` file and save the output json. This can be used to create the `.sample.js` file for the main test. _If you don't have much experience with automated testing don't let this put you off tho, feel free to raise it as an issue and we can help out._ -**top tip**: the draftJs block key attributes are randomly generated, and therefore cannot be tested in a deterministic way. However there is a well established workaround, you can replace them in the json with a type definition. eg instead of expecting it to be a specific number, you just expect it to be a string. +### Testing randomly generated DraftJS Block attributes + +**top tip**: the DraftJS block key attributes are randomly generated, and therefore cannot be tested in a deterministic way. However there is a well established workaround: -In practice, for instance In Visual code you can search using a regex (option `*`). So you could search for +- you can replace them in the JSON with a type definition. +- e.g. instead of expecting it to be a specific number, you just expect it to be a string. + +In practice, in Visual code, you can search using a regex (option `*`). So you could search for ```js "key": "([a-zA-Z0-9]*)", ``` -And replace all with + +And replace all with + ```js "key": expect.any(String),//"ss8pm4p" ``` diff --git a/docs/guides/analytics.md b/docs/guides/analytics.md index d468bedb..44ac80cc 100644 --- a/docs/guides/analytics.md +++ b/docs/guides/analytics.md @@ -1,23 +1,23 @@ -# Analytics +# Analytics -The `ReactTranscriptEditor` component has an optional setup to track some analytics events around the usage of the main functionalities. +The `TranscriptEditor` component has an optional setup to track some analytics events around the usage of the main functionalities. As you can see in the demo app at `/src/index.js` there is an optional `handleAnalyticsEvents`. ```js ``` It returns an object, which in the example we are adding to an array. and displaying at the [bottom of the demo page](https://bbc.github.io/react-transcript-editor/) in a `textarea`. -Here's an example of the output +Here's an example of the output ```json [ @@ -51,9 +51,15 @@ Here's an example of the output This data is what you can send to your analytics system/provider. Eg if you are using [piwik/matomo](https://matomo.org/free-software/) with the js sdk then you could setup an handler like this to track individual events with their dashboard. ```js - handleAnalyticsEvents = (event) => { - if(window.location.hostname !== "localhost"){ - _paq.push(['trackEvent', event.category, event.action, event.name, event.value ]); - } - } -``` \ No newline at end of file +handleAnalyticsEvents = (event) => { + if (window.location.hostname !== "localhost") { + _paq.push([ + "trackEvent", + event.category, + event.action, + event.name, + event.value, + ]); + } +}; +``` diff --git a/docs/guides/draftjs-blocks-entityrange-entitmap.md b/docs/guides/draftjs-blocks-entityrange-entitymap.md similarity index 80% rename from docs/guides/draftjs-blocks-entityrange-entitmap.md rename to docs/guides/draftjs-blocks-entityrange-entitymap.md index c1edc388..5d0ce670 100644 --- a/docs/guides/draftjs-blocks-entityrange-entitmap.md +++ b/docs/guides/draftjs-blocks-entityrange-entitymap.md @@ -1,22 +1,21 @@ - -### DraftJS block, entityRanges and entityMap +# DraftJS Block, entityRanges and entityMap A quick side note on how the DraftJS block, entityRanges and entityMap works, in the context of the TranscriptEditor component. For the [adapters](./adapters.md) guide. +## Block -#### Data Block - -TL;DR: a block is a representation of a paragraph (as an Immutable Record) in draftJs and you can have some custom data associated to it. +TL;DR: -But see the docs notes on [draftjs basics](https://github.com/bbc/react-transcript-editor/blob/master/docs/notes/draftjs/2018-10-01-draftjs-1-basics.md) to better understand the role of content block within the editor. As well as the draftJs official docs. +- a block is a representation of a paragraph (as an Immutable Record) in draftJs and you can have some custom data associated to it. + But see the docs notes on [DraftJS basics](https://github.com/bbc/react-transcript-editor/blob/master/docs/notes/draftjs/2018-10-01-draftjs-1-basics.md) to better understand the role of content block within the editor. As well as the draftJs official docs. -Here's an example of a block, you can see it can contain some custom data, eg speaker name, list of words, and start time (which would be the start time of the first word). +Here's an example of a Block, you can see it can contain some custom data, eg speaker name, list of words, and start time (which would be the start time of the first word). ```js [ { - "text": "There is a day.", // text - "type": "paragraph", // type of block + "text": "There is a day.", // text + "type": "paragraph", // type of block "data": { //optional custom data "speaker": "TBC 0", "words": [ @@ -33,24 +32,20 @@ Here's an example of a block, you can see it can contain some custom data, eg sp It also contains a list of `entityRanges`. -### Entity Ranges +## Entity Ranges -`entityRanges` are part of individual blocks. +`entityRanges` are part of individual Blocks. - + -From draftJs docs on [entity](https://draftjs.org/docs/advanced-topics-entities) +From Draft JS docs on [entity](https://draftjs.org/docs/advanced-topics-entities) > the Entity system, which Draft uses for annotating ranges of text with metadata. Entities introduce levels of richness beyond styled text. Links, mentions, and embedded content can all be implemented using entities. -This is what we use to identify the words, from a list of characters, and associate data to it, such as start and end time information. - +This is what we use to identify the words, from a list of characters, and associate data to it, such as start and end time information. It sets the foundations for features such as click on a word can jump the player play-head to the corresponding time for that word. - -Here's an example of `entityRanges` in the context of a data block. - +Here's an example of `entityRanges` in the context of a data Block. Required fields are the `offset`, and `length`, which are used to identify the entity within the characters of the `text` attribute of the block. - This, combined with the `entityMap` has the advantage that if you type or delete some text before a certain entity, draftJs will do the ground work of adjusting the offsets and keeping these info in sync. ```js @@ -67,22 +62,21 @@ This, combined with the `entityMap` has the advantage that if you type or delete "end": 13.17, // Custom fields "confidence": 0.68, // Custom fields "text": "There", // Custom fields - to detect what has changed - "offset": 0, // Required by Draft.js to know start of "selection" - "length": 5, //Required by Draft.js to know end of "selection" - in our case a word - "key": "ctavu0r" // can also be provided by draftjs if not provided. But providing your own gives more flexibility + "offset": 0, // Required by Draft.js to know start of "selection" + "length": 5, //Required by Draft.js to know end of "selection" - in our case a word + "key": "ctavu0r" // can also be provided by draftjs if not provided. But providing your own gives more flexibility }, ... ``` -### Entity Map +### Entity Map `entityMap` defines how to render the entities for the draftJs content state. - See draftJs docs for more on [entities](https://draftjs.org/docs/advanced-topics-entities#introduction) - And keeps in sync `entityRanges` through the `offset` and `length` attribute. -Here's an example +Here's an example: + ```js { "ayx62lj": { @@ -100,7 +94,7 @@ Here's an example }, ``` -To see this in the larger context when we call `sttJsonAdapter` with `transcriptData` and a `sttJsonType` we expect it to return an object with two attributes `blocks` and `entityMap`. +To see this in the larger context when we call `sttJsonAdapter` with `transcriptData` and a `sttJsonType` we expect it to return an object with two attributes `blocks` and `entityMap`. ```js { @@ -192,5 +186,4 @@ To see this in the larger context when we call `sttJsonAdapter` with `transcript } ``` - -The good news, is that given the blocks and the entityRanges, we can programmatically generate the entityMap. Which means you don't have to worry about creating the entityMap when making an adapter. \ No newline at end of file +The good news, is that given the blocks and the entityRanges, we can programmatically generate the entityMap. Which means you don't have to worry about creating the entityMap when making an adapter. diff --git a/docs/guides/storybook-npm-setup.md b/docs/guides/storybook-npm-setup.md index 364fb7cc..2b08fd8d 100644 --- a/docs/guides/storybook-npm-setup.md +++ b/docs/guides/storybook-npm-setup.md @@ -8,7 +8,7 @@ _Based on PR [#135](https://github.com/bbc/react-transcript-editor/pull/135)_ 1. Commit changes to master 2. Publish component to NPM - 1. `npm run publish:public` + 1. `npm run publish:public` 3. Publish to Github Pages - `npm run deploy:ghpages` ## Webpack @@ -41,15 +41,15 @@ module.exports = { exportAdapter: "./packages/export-adapters/index.js", sttJsonAdapter: "./packages/stt-adapters/index.js", groupWordsInParagraphsBySpeakersDPE: - "./packages/stt-adapters/digital-paper-edit/group-words-by-speakers.js" + "./packages/stt-adapters/digital-paper-edit/group-words-by-speakers.js", }, output: { path: path.resolve("dist"), filename: "[name].js", - libraryTarget: "commonjs2" + libraryTarget: "commonjs2", }, optimization: { - minimize: true + minimize: true, }, module: { rules: [ @@ -57,15 +57,15 @@ module.exports = { test: /\.module.css$/, use: [ { - loader: "style-loader" + loader: "style-loader", }, { loader: "css-loader", options: { - modules: true - } - } - ] + modules: true, + }, + }, + ], }, { test: /\.(js|jsx)$/, @@ -75,32 +75,32 @@ module.exports = { use: { loader: "babel-loader", options: { - presets: ["@babel/preset-env", "@babel/preset-react"] - } - } - } - ] + presets: ["@babel/preset-env", "@babel/preset-react"], + }, + }, + }, + ], }, resolve: { alias: { react: path.resolve(__dirname, "./node_modules/react"), - "react-dom": path.resolve(__dirname, "./node_modules/react-dom") - } + "react-dom": path.resolve(__dirname, "./node_modules/react-dom"), + }, }, externals: { react: { commonjs: "react", commonjs2: "react", amd: "React", - root: "React" + root: "React", }, "react-dom": { commonjs: "react-dom", commonjs2: "react-dom", amd: "ReactDOM", - root: "ReactDOM" - } - } + root: "ReactDOM", + }, + }, }; ``` @@ -120,8 +120,8 @@ and import { TimedTextEditor } from "@bbc/react-transcript-editor"; ``` -However, as mentioned in the README - __it is preferred to import individual components like using the first option: -`@bbc/react-transcript-editor/TimedTextEditor`__ as the other importing method imports the entire library. +However, as mentioned in the README - **it is preferred to import individual components like using the first option: +`@bbc/react-transcript-editor/TimedTextEditor`** as the other importing method imports the entire library. ### Caveats @@ -140,7 +140,7 @@ Assets: #### CSS module support for Storybook -Storybooks __DO NOT support CSS modules out of the box__, so if you remove CRA (`create-react-app`) scripts [the css modules will not load in the Storybook](https://github.com/storybooks/storybook/issues/2320). +Storybooks **DO NOT support CSS modules out of the box**, so if you remove CRA (`create-react-app`) scripts [the css modules will not load in the Storybook](https://github.com/storybooks/storybook/issues/2320). `storybook/webpack.config.js` augments the storybook with support for CSS modules. ## Testing