Skip to content

Commit 4b95be2

Browse files
committed
Initial commit
0 parents  commit 4b95be2

File tree

4 files changed

+327
-0
lines changed

4 files changed

+327
-0
lines changed

history.md

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
0.1.0 / 2016-04-15
2+
==================

logo.svg

+5
Loading

package.json

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"name": "hast",
3+
"private": true,
4+
"version": "0.1.0",
5+
"description": "Hypertext Abstract Syntax Tree format",
6+
"license": "MIT",
7+
"keywords": [],
8+
"repository": {
9+
"type": "git",
10+
"url": "https://github.com/wooorm/hast.git"
11+
},
12+
"author": "Titus Wormer <[email protected]> (http://github.com)",
13+
"contributors": [
14+
"Titus Wormer <[email protected]> (http://github.com)"
15+
],
16+
"bugs": "https://github.com/wooorm/hast/issues",
17+
"dependencies": {}
18+
}

readme.md

+302
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# ![HAST][logo]
2+
3+
**H**ypertext **A**bstract **S**yntax **T**ree format.
4+
5+
***
6+
7+
> :information_desk_person: I’m working on **rehype**, which uses
8+
> this format. A very alpha version of rehype was previously
9+
> published here as `hast`. It’s still published on
10+
> [npm](http://npmjs.com/hast) though. **rehype** will be published
11+
> soon.
12+
>
13+
> Early feedback is greatly appreciated!
14+
15+
**HAST** discloses HTML as an abstract syntax tree. _Abstract_
16+
means not all information is stored in this tree and an exact replica
17+
of the original document cannot be re-created. _Syntax Tree_ means syntax
18+
**is** present in the tree, thus an exact syntactic document can be
19+
re-created.
20+
21+
The reason for introducing a new “virtual” DOM is manyfold, primarily:
22+
23+
* The DOM is very heavy to implement outside of the browser;
24+
a lean, stripped down virtual DOM can be used everywhere;
25+
26+
* Most virtual DOMs do not focus on easy of use in transformations;
27+
28+
* Other virtual DOMs cannot represent the syntax of HTML in its
29+
entirety, think comments, document types, and character data;
30+
31+
* Neither HTML nor virtual DOMs focus on positional information.
32+
33+
**HAST** is a subset of [Unist][].
34+
35+
## List of Utilities
36+
37+
* [`wooorm/hastscript`](https://github.com/wooorm/hastscript)
38+
— Hyperscript compatible DSL for creating nodes;
39+
40+
* [`wooorm/hast-util-has-property`](https://github.com/wooorm/hast-util-has-property)
41+
— Check if a node has a property;
42+
43+
* [`wooorm/hast-util-interactive`](https://github.com/wooorm/hast-util-interactive)
44+
— Check if a node is interactive;
45+
46+
* [`wooorm/hast-util-labelable`](https://github.com/wooorm/hast-util-labelable)
47+
— Check if a node is labelable;
48+
49+
* [`wooorm/hast-util-parse-selector`](https://github.com/wooorm/hast-util-parse-selector)
50+
— Create a node from a simple CSS selector;
51+
52+
* [`wooorm/hast-util-whitespace`](https://github.com/wooorm/hast-util-whitespace)
53+
— Check if a node is inter-element whitespace;
54+
55+
See the [List of Unist Utilities][unist-utility] for projects which
56+
work with **HAST** nodes too.
57+
58+
## AST
59+
60+
### `Root`
61+
62+
**Root** ([**Parent**][parent]) houses all nodes.
63+
64+
```idl
65+
interface Root <: Parent {
66+
type: "root";
67+
}
68+
```
69+
70+
### `Element`
71+
72+
**Element** ([**Parent**][parent]) represents an HTML Element. For example,
73+
a `div`. HAST Elements corresponds to the [HTML Element][html-element]
74+
interface.
75+
76+
```idl
77+
interface Element <: Parent {
78+
type: "element";
79+
tagName: string;
80+
properties: Properties;
81+
}
82+
```
83+
84+
For example, the following HTML:
85+
86+
```html
87+
<a href="http://alpha.com" class="bravo" download></a>
88+
```
89+
90+
Yields:
91+
92+
```json
93+
{
94+
"type": "element",
95+
"tagName": "a",
96+
"properties": {
97+
"href": "http://alpha.com",
98+
"id": "bravo",
99+
"className": ["bravo"],
100+
"download": true
101+
},
102+
"children": []
103+
}
104+
```
105+
106+
#### `Properties`
107+
108+
A dictionary of property names to property values. Most virtual DOMs
109+
require a disambiguation between `attributes` and `properties`. HAST
110+
does not and defers this to compilers.
111+
112+
```idl
113+
interface Properties {}
114+
```
115+
116+
##### Property names
117+
118+
Property names are keys on [`properties`][properties] objects and
119+
reflect HTML attribute names. Often, they have the same value as
120+
the corresponding HTML attribute (for example, `href` is a property
121+
name reflecting the `href` attribute name).
122+
If the HTML attribute name contains one or more dashes, the HAST
123+
property name must be camel-cased (for example, `ariaLabel` is a
124+
property reflecting the `aria-label` attribute).
125+
If the HTML attribute is a reserved ECMAScript keyword, a common
126+
alternative must be used. This is the case for `class`, which uses
127+
`className` in HAST (and DOM), and `for`, which uses `htmlFor`.
128+
129+
> DOM uses other prefixes and suffixes too, for example, `relList`
130+
> for HTML `rel` attributes. This does not occur in HAST.
131+
132+
When possible, HAST properties must be camel-cased if the HTML property
133+
name originates from multiple words. For example, the `minlength` HTML
134+
attribute is cased as `minLength`, and `typemustmatch` as `typeMustMatch`.
135+
136+
##### Property values
137+
138+
Property values should reflect the data type determined by their
139+
property name. For example, the following HTML `<div hidden></div>`
140+
contains a `hidden` (boolean) attribute, which is reflected a `hidden`
141+
property name set to `true` (boolean) as value in HAST, and
142+
`<input minlength="5">`, which contains a `minlength` (valid
143+
non-negative integer) attribute, is reflected as a property `minLength`
144+
set to `5` (number) in HAST.
145+
146+
> In JSON, the value property value `null` must be treated as if the
147+
> property was not included.
148+
> In JavaScript, both `null` and `undefined` must be similarly
149+
> ignored.
150+
151+
The DOM is strict in reflecting those properties, and HAST is not,
152+
where the DOM treats `<div hidden=no></div>` as having a `true`
153+
(boolean) value for the `hidden` attribute, and `<img width="yes">`
154+
as having a `0` (number) value for the `width` attribute, these should
155+
be reflected as `"no"` and `"yes"`, respectively, in HAST.
156+
157+
> The reason for this is to allow plug-ins and utilities to inspect
158+
> these values.
159+
160+
The DOM also specifies comma- and space-separated lists attribute
161+
values. In HAST, these should be treated as ordered lists. For example,
162+
`<div class="alpha bravo"></div>` is represented as
163+
`["alpha", "bravo"]`.
164+
165+
> :warning: I’m unsure whether the value of `style` properties
166+
> should be exposed as an `object`: it’s nice for easy access but
167+
> would not work well for transformations, as CSS properties cascade
168+
> by order, which cannot be represented by JavaScript Objects.
169+
170+
### `Directive`
171+
172+
**Directive** ([**Text**][text]) represents an instruction
173+
(declaration or processing instruction).
174+
175+
```idl
176+
interface Directive <: Text {
177+
type: "directive";
178+
name: string;
179+
}
180+
```
181+
182+
For example, the following HTML:
183+
184+
```html
185+
<!doctype html>
186+
```
187+
188+
Yields:
189+
190+
```json
191+
{
192+
"type": "directive",
193+
"name": "!doctype",
194+
"value": "!doctype html"
195+
}
196+
```
197+
198+
### `Comment`
199+
200+
**Comment** ([**Text**][text]) represents embedded information.
201+
202+
```idl
203+
interface Comment <: Text {
204+
type: "comment";
205+
}
206+
```
207+
208+
For example, the following HTML:
209+
210+
```html
211+
<!--Charlie-->
212+
```
213+
214+
Yields:
215+
216+
```json
217+
{
218+
"type": "comment",
219+
"value": "Charlie"
220+
}
221+
```
222+
223+
### `CharacterData`
224+
225+
**CharacterData** ([**Text**][text]) represents character data.
226+
227+
```idl
228+
interface CharacterData <: Text {
229+
type: "characterData";
230+
}
231+
```
232+
233+
For example, the following HTML:
234+
235+
```html
236+
<![CDATA[<delta>Echo</delta>]]>
237+
```
238+
239+
Yields:
240+
241+
```json
242+
{
243+
"type": "characterData",
244+
"value": "<delta>Echo</delta>"
245+
}
246+
```
247+
248+
### `Text`
249+
250+
**TextNode** ([**Text**][text]) represents everything that is text.
251+
Note that its `type` property is `text`, but it is different
252+
from the abstract **Unist** interface **Text**.
253+
254+
```idl
255+
interface TextNode <: Text {
256+
type: "text";
257+
}
258+
```
259+
260+
For example, the following HTML:
261+
262+
```html
263+
<span>Foxtrot</span>
264+
```
265+
266+
Yields:
267+
268+
```json
269+
{
270+
"type": "element",
271+
"tagName": "span",
272+
"properties": {},
273+
"children": [{
274+
"type": "text",
275+
"value": "Foxtrot"
276+
}]
277+
}
278+
```
279+
280+
## Related
281+
282+
* [Unist][]
283+
* [vfile][]
284+
* rehype
285+
286+
<!-- Definitions -->
287+
288+
[logo]: https://cdn.rawgit.com/wooorm/hast/master/logo.svg
289+
290+
[vfile]: https://github.com/wooorm/vfile
291+
292+
[html-element]: https://dom.spec.whatwg.org/#interface-element
293+
294+
[unist-utility]: https://github.com/wooorm/unist#list-of-utilties
295+
296+
[unist]: https://github.com/wooorm/unist
297+
298+
[parent]: https://github.com/wooorm/unist#parent
299+
300+
[text]: https://github.com/wooorm/unist#text
301+
302+
[properties]: #properties

0 commit comments

Comments
 (0)