|
| 1 | +# ![HAST][logo] |
| 2 | + |
| 3 | +**H**ypertext **A**bstract **S**yntax **T**ree format. |
| 4 | + |
| 5 | +*** |
| 6 | + |
| 7 | +> :information_desk_person: I’m working on **rehype**, which uses |
| 8 | +> this format. A very alpha version of rehype was previously |
| 9 | +> published here as `hast`. It’s still published on |
| 10 | +> [npm](http://npmjs.com/hast) though. **rehype** will be published |
| 11 | +> soon. |
| 12 | +> |
| 13 | +> Early feedback is greatly appreciated! |
| 14 | +
|
| 15 | +**HAST** discloses HTML as an abstract syntax tree. _Abstract_ |
| 16 | +means not all information is stored in this tree and an exact replica |
| 17 | +of the original document cannot be re-created. _Syntax Tree_ means syntax |
| 18 | +**is** present in the tree, thus an exact syntactic document can be |
| 19 | +re-created. |
| 20 | + |
| 21 | +The reason for introducing a new “virtual” DOM is manyfold, primarily: |
| 22 | + |
| 23 | +* The DOM is very heavy to implement outside of the browser; |
| 24 | + a lean, stripped down virtual DOM can be used everywhere; |
| 25 | + |
| 26 | +* Most virtual DOMs do not focus on easy of use in transformations; |
| 27 | + |
| 28 | +* Other virtual DOMs cannot represent the syntax of HTML in its |
| 29 | + entirety, think comments, document types, and character data; |
| 30 | + |
| 31 | +* Neither HTML nor virtual DOMs focus on positional information. |
| 32 | + |
| 33 | +**HAST** is a subset of [Unist][]. |
| 34 | + |
| 35 | +## List of Utilities |
| 36 | + |
| 37 | +* [`wooorm/hastscript`](https://github.com/wooorm/hastscript) |
| 38 | + — Hyperscript compatible DSL for creating nodes; |
| 39 | + |
| 40 | +* [`wooorm/hast-util-has-property`](https://github.com/wooorm/hast-util-has-property) |
| 41 | + — Check if a node has a property; |
| 42 | + |
| 43 | +* [`wooorm/hast-util-interactive`](https://github.com/wooorm/hast-util-interactive) |
| 44 | + — Check if a node is interactive; |
| 45 | + |
| 46 | +* [`wooorm/hast-util-labelable`](https://github.com/wooorm/hast-util-labelable) |
| 47 | + — Check if a node is labelable; |
| 48 | + |
| 49 | +* [`wooorm/hast-util-parse-selector`](https://github.com/wooorm/hast-util-parse-selector) |
| 50 | + — Create a node from a simple CSS selector; |
| 51 | + |
| 52 | +* [`wooorm/hast-util-whitespace`](https://github.com/wooorm/hast-util-whitespace) |
| 53 | + — Check if a node is inter-element whitespace; |
| 54 | + |
| 55 | +See the [List of Unist Utilities][unist-utility] for projects which |
| 56 | +work with **HAST** nodes too. |
| 57 | + |
| 58 | +## AST |
| 59 | + |
| 60 | +### `Root` |
| 61 | + |
| 62 | +**Root** ([**Parent**][parent]) houses all nodes. |
| 63 | + |
| 64 | +```idl |
| 65 | +interface Root <: Parent { |
| 66 | + type: "root"; |
| 67 | +} |
| 68 | +``` |
| 69 | + |
| 70 | +### `Element` |
| 71 | + |
| 72 | +**Element** ([**Parent**][parent]) represents an HTML Element. For example, |
| 73 | +a `div`. HAST Elements corresponds to the [HTML Element][html-element] |
| 74 | +interface. |
| 75 | + |
| 76 | +```idl |
| 77 | +interface Element <: Parent { |
| 78 | + type: "element"; |
| 79 | + tagName: string; |
| 80 | + properties: Properties; |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +For example, the following HTML: |
| 85 | + |
| 86 | +```html |
| 87 | +<a href="http://alpha.com" class="bravo" download></a> |
| 88 | +``` |
| 89 | + |
| 90 | +Yields: |
| 91 | + |
| 92 | +```json |
| 93 | +{ |
| 94 | + "type": "element", |
| 95 | + "tagName": "a", |
| 96 | + "properties": { |
| 97 | + "href": "http://alpha.com", |
| 98 | + "id": "bravo", |
| 99 | + "className": ["bravo"], |
| 100 | + "download": true |
| 101 | + }, |
| 102 | + "children": [] |
| 103 | +} |
| 104 | +``` |
| 105 | + |
| 106 | +#### `Properties` |
| 107 | + |
| 108 | +A dictionary of property names to property values. Most virtual DOMs |
| 109 | +require a disambiguation between `attributes` and `properties`. HAST |
| 110 | +does not and defers this to compilers. |
| 111 | + |
| 112 | +```idl |
| 113 | +interface Properties {} |
| 114 | +``` |
| 115 | + |
| 116 | +##### Property names |
| 117 | + |
| 118 | +Property names are keys on [`properties`][properties] objects and |
| 119 | +reflect HTML attribute names. Often, they have the same value as |
| 120 | +the corresponding HTML attribute (for example, `href` is a property |
| 121 | +name reflecting the `href` attribute name). |
| 122 | +If the HTML attribute name contains one or more dashes, the HAST |
| 123 | +property name must be camel-cased (for example, `ariaLabel` is a |
| 124 | +property reflecting the `aria-label` attribute). |
| 125 | +If the HTML attribute is a reserved ECMAScript keyword, a common |
| 126 | +alternative must be used. This is the case for `class`, which uses |
| 127 | +`className` in HAST (and DOM), and `for`, which uses `htmlFor`. |
| 128 | + |
| 129 | +> DOM uses other prefixes and suffixes too, for example, `relList` |
| 130 | +> for HTML `rel` attributes. This does not occur in HAST. |
| 131 | +
|
| 132 | +When possible, HAST properties must be camel-cased if the HTML property |
| 133 | +name originates from multiple words. For example, the `minlength` HTML |
| 134 | +attribute is cased as `minLength`, and `typemustmatch` as `typeMustMatch`. |
| 135 | + |
| 136 | +##### Property values |
| 137 | + |
| 138 | +Property values should reflect the data type determined by their |
| 139 | +property name. For example, the following HTML `<div hidden></div>` |
| 140 | +contains a `hidden` (boolean) attribute, which is reflected a `hidden` |
| 141 | +property name set to `true` (boolean) as value in HAST, and |
| 142 | +`<input minlength="5">`, which contains a `minlength` (valid |
| 143 | +non-negative integer) attribute, is reflected as a property `minLength` |
| 144 | +set to `5` (number) in HAST. |
| 145 | + |
| 146 | +> In JSON, the value property value `null` must be treated as if the |
| 147 | +> property was not included. |
| 148 | +> In JavaScript, both `null` and `undefined` must be similarly |
| 149 | +> ignored. |
| 150 | +
|
| 151 | +The DOM is strict in reflecting those properties, and HAST is not, |
| 152 | +where the DOM treats `<div hidden=no></div>` as having a `true` |
| 153 | +(boolean) value for the `hidden` attribute, and `<img width="yes">` |
| 154 | +as having a `0` (number) value for the `width` attribute, these should |
| 155 | +be reflected as `"no"` and `"yes"`, respectively, in HAST. |
| 156 | + |
| 157 | +> The reason for this is to allow plug-ins and utilities to inspect |
| 158 | +> these values. |
| 159 | +
|
| 160 | +The DOM also specifies comma- and space-separated lists attribute |
| 161 | +values. In HAST, these should be treated as ordered lists. For example, |
| 162 | +`<div class="alpha bravo"></div>` is represented as |
| 163 | +`["alpha", "bravo"]`. |
| 164 | + |
| 165 | +> :warning: I’m unsure whether the value of `style` properties |
| 166 | +> should be exposed as an `object`: it’s nice for easy access but |
| 167 | +> would not work well for transformations, as CSS properties cascade |
| 168 | +> by order, which cannot be represented by JavaScript Objects. |
| 169 | +
|
| 170 | +### `Directive` |
| 171 | + |
| 172 | +**Directive** ([**Text**][text]) represents an instruction |
| 173 | +(declaration or processing instruction). |
| 174 | + |
| 175 | +```idl |
| 176 | +interface Directive <: Text { |
| 177 | + type: "directive"; |
| 178 | + name: string; |
| 179 | +} |
| 180 | +``` |
| 181 | + |
| 182 | +For example, the following HTML: |
| 183 | + |
| 184 | +```html |
| 185 | +<!doctype html> |
| 186 | +``` |
| 187 | + |
| 188 | +Yields: |
| 189 | + |
| 190 | +```json |
| 191 | +{ |
| 192 | + "type": "directive", |
| 193 | + "name": "!doctype", |
| 194 | + "value": "!doctype html" |
| 195 | +} |
| 196 | +``` |
| 197 | + |
| 198 | +### `Comment` |
| 199 | + |
| 200 | +**Comment** ([**Text**][text]) represents embedded information. |
| 201 | + |
| 202 | +```idl |
| 203 | +interface Comment <: Text { |
| 204 | + type: "comment"; |
| 205 | +} |
| 206 | +``` |
| 207 | + |
| 208 | +For example, the following HTML: |
| 209 | + |
| 210 | +```html |
| 211 | +<!--Charlie--> |
| 212 | +``` |
| 213 | + |
| 214 | +Yields: |
| 215 | + |
| 216 | +```json |
| 217 | +{ |
| 218 | + "type": "comment", |
| 219 | + "value": "Charlie" |
| 220 | +} |
| 221 | +``` |
| 222 | + |
| 223 | +### `CharacterData` |
| 224 | + |
| 225 | +**CharacterData** ([**Text**][text]) represents character data. |
| 226 | + |
| 227 | +```idl |
| 228 | +interface CharacterData <: Text { |
| 229 | + type: "characterData"; |
| 230 | +} |
| 231 | +``` |
| 232 | + |
| 233 | +For example, the following HTML: |
| 234 | + |
| 235 | +```html |
| 236 | +<![CDATA[<delta>Echo</delta>]]> |
| 237 | +``` |
| 238 | + |
| 239 | +Yields: |
| 240 | + |
| 241 | +```json |
| 242 | +{ |
| 243 | + "type": "characterData", |
| 244 | + "value": "<delta>Echo</delta>" |
| 245 | +} |
| 246 | +``` |
| 247 | + |
| 248 | +### `Text` |
| 249 | + |
| 250 | +**TextNode** ([**Text**][text]) represents everything that is text. |
| 251 | +Note that its `type` property is `text`, but it is different |
| 252 | +from the abstract **Unist** interface **Text**. |
| 253 | + |
| 254 | +```idl |
| 255 | +interface TextNode <: Text { |
| 256 | + type: "text"; |
| 257 | +} |
| 258 | +``` |
| 259 | + |
| 260 | +For example, the following HTML: |
| 261 | + |
| 262 | +```html |
| 263 | +<span>Foxtrot</span> |
| 264 | +``` |
| 265 | + |
| 266 | +Yields: |
| 267 | + |
| 268 | +```json |
| 269 | +{ |
| 270 | + "type": "element", |
| 271 | + "tagName": "span", |
| 272 | + "properties": {}, |
| 273 | + "children": [{ |
| 274 | + "type": "text", |
| 275 | + "value": "Foxtrot" |
| 276 | + }] |
| 277 | +} |
| 278 | +``` |
| 279 | + |
| 280 | +## Related |
| 281 | + |
| 282 | +* [Unist][] |
| 283 | +* [vfile][] |
| 284 | +* rehype |
| 285 | + |
| 286 | +<!-- Definitions --> |
| 287 | + |
| 288 | +[logo]: https://cdn.rawgit.com/wooorm/hast/master/logo.svg |
| 289 | + |
| 290 | +[vfile]: https://github.com/wooorm/vfile |
| 291 | + |
| 292 | +[html-element]: https://dom.spec.whatwg.org/#interface-element |
| 293 | + |
| 294 | +[unist-utility]: https://github.com/wooorm/unist#list-of-utilties |
| 295 | + |
| 296 | +[unist]: https://github.com/wooorm/unist |
| 297 | + |
| 298 | +[parent]: https://github.com/wooorm/unist#parent |
| 299 | + |
| 300 | +[text]: https://github.com/wooorm/unist#text |
| 301 | + |
| 302 | +[properties]: #properties |
0 commit comments