-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathindex.html
More file actions
379 lines (345 loc) · 15.2 KB
/
Copy pathindex.html
File metadata and controls
379 lines (345 loc) · 15.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
---
layout: default
title: "TJSON: Tagged JSON with rich type annotations"
---
<div class="container">
<div class="row">
<div class="col-lg-12">
<div class="lead text-xs-center">
<p>
<strong>DRAFT:</strong> This format is still in a draft state and subject to change!
</p>
</div>
<div id="introduction">
<p>
<strong>TJSON</strong> (Tagged JSON) is a tagging scheme/microformat for enriching
the types that can be stored in <a href="http://www.json.org">JSON</a> documents.
It augments the existing types present in JSON, codifiying ad hoc practices already
commonly used for processing JSON into a schema-free, self-describing format.
</p>
<p>
TJSON documents are amenable to "content-aware hashing" where different encodings of the
same data (including both TJSON and binary formats like Protocol Buffers, MessagePack,
BSON, etc) can share the same content hash and therefore the same cryptographic signature.
This is possible with content hash algorithms that are aware of the underlying structure
of data, such as
<a href="https://github.com/benlaurie/objecthash">Ben Laurie's objecthash</a>.
</p>
<p>
TJSON supports the following data types:
</p>
<ul>
<li>
<strong>Objects:</strong>
Name/value dictionaries. The names of objects in TJSON carry a postfix "tag" which acts
as a type annotation for the associated value. See the descriptions of "Strings" below
for more information.
</li>
<li>
<strong>Arrays:</strong>
Lists of values: identical to JSON, but typed by their containing objects. Unlike
JSON, arrays cannot be used as a top-level expression: only objects are allowed.
</li>
<li>
<strong>Sets:</strong>
Lists of unique values: similar to an array, but repeated elements are disallowed.
</li>
<li>
<strong>Strings:</strong>
TJSON strings are Unicode and always serialized as UTF-8. When used as the name of a
member of an object, they carry a mandatory "tag" which functions as a self-describing
type annotation which provides a type signature for the associated value.
</li>
<li>
<strong>Binary Data:</strong>
First-class support for 8-bit clean binary data, encoded in a variety of formats
including hexadecimal (a.k.a. base16), base32, and base64url.
</li>
<li>
<strong>Numbers:</strong>
</li>
<ul>
<li>
<strong>Integers:</strong>
TJSON supports the full ranges of both signed and unsigned 64-bit integers
by serializing them as strings.
</li>
<li>
<strong>Floating points:</strong>
Floating point numbers in TJSON are identical to JSON, but can always be disambiguated
from integers.
</li>
</ul>
<li>
<strong>Timestamps:</strong>
TJSON has a first-class type for representing date/time timestamp values,
serialized as a subset of RFC 3339 (an ISO 8601-alike).
</li>
<li>
<strong>Boolean Values:</strong>
TJSON supports the <em>true</em> and <em>false</em> values from JSON
(<em>null</em> is expressly disallowed).
</li>
</ul>
</div>
<div class="subsection">
<h2 class="page-header">Objects</h2>
<p>
Objects are the <sem>only</em> type allowed at the top-level of a TJSON document.
Many ordinary JSON parsers accept arrays or other types as top-level expressions. This
is <em>NOT</em> the case in TJSON: objects-only at the top-level.
</p>
<p>
Objects in TJSON use the same syntax as JSON, but each member name contains a "tag"
which annotates the type of the associated value of the member.
</p>
<p>
Below is an example of an object whose value is a Unicode String:
</p>
<div class="syntax-example">
{"hello-world<span class="tag-prefix">:s</span>": "Hello, world!"}
</div>
<p>
This example consists of an object whose only member is named <em>"hello-world"</em>
and whose corresponding value is the <em>string (:s)</em> encoded in UTF-8 whose
contents are <em>"Hello, world!"</em>
</p>
<p>
Member names in TJSON must be distinct. The use of the same member name more
than once in the same object is an error, regardless of if the same name is used
for the same value, same types, or multiple different types. TJSON names are
single-use only.
</p>
<p>
TJSON uses the case of the first letter of the name of a type to distinguish
between scalar (single value) and non-scalar (collection) types. The syntax for
identifying a nested TJSON object is a capital "O" letter: (NOT zero)
</p>
<div class="syntax-example">
{"hello-object<span class="tag-prefix">:O</span>": {"hello-string<span class="tag-prefix">:s</span>": "Hello, world!"}}
</div>
</div>
<div class="subsection">
<h2 class="page-header">Arrays</h2>
<p>
Arrays are not allowed as a toplevel expression in TJSON. The following is <em>NOT</em>
a valid TJSON document, because toplevel arrays are NOT allowed in TJSON:
</p>
<div class="syntax-example syntax-invalid">
["No toplevel arrays in TJSON!"]
</div>
<p>
Arrays <i>MUST</i> first be wrapped in an object, from which they inherit their type
information. Arrays are described by an "A" tag (non-scalar types in TJSON are
capitalized) however this tag alone is not sufficient:
</p>
<div class="syntax-example syntax-invalid">
{"not-quite-valid<span class="tag-prefix">:A</span>": ["Hello, world!"]}
</div>
<p>
To properly tag TJSON array, you <i>MUST</i> also include the type of its contents in
the tag. The following is valid array syntax:
</p>
<div class="syntax-example">
{"valid-array<span class="tag-prefix">:A<s></span>": ["Hello, world!"]}
</div>
<p>
The above syntax describes an <em>array</em> of <em>strings</em>. It might remind you
of <em>generic</em> syntax from statically typed programming languages. TJSON contains
a tiny type system it uses to verify type annotations.
</p>
<p>
The syntax may be nested to support multidimensional arrays:
</p>
<div class="syntax-example">
{"nested-array<span class="tag-prefix">:A<A<s>></span>": [["Nested"], ["Array!"]]}
</div>
<p>
Or objects nested within arrays:
</p>
<div class="syntax-example">
{"nested-object<span class="tag-prefix">:A<O></span>": [{"nested<span class="tag-prefix">:s</span>": "object"}]}
</div>
<p>
The inner type parameter may be omitted for empty arrays:
</p>
<div class="syntax-example">
{"empty-array<span class="tag-prefix">:A<></span>": []}
</div>
</div>
<div class="subsection">
<h2 class="page-header">Sets</h2>
<p>
Sets use a syntax that's nearly identical to arrays, but require
elements within the set are unique:
</p>
<div class="syntax-example">
{"valid-set<span class="tag-prefix">:S<s></span>": ["One", "Two", "Three"]}
</div>
<p>
Sets containing repeated items are invalid and are rejected by
compliant parsers:
</p>
<div class="syntax-example syntax-invalid">
{"invalid-set<span class="tag-prefix">:S<s></span>": ["One", "One", "One"]}
</div>
</div>
<div class="subsection">
<h2 class="page-header">Strings</h2>
<p>
As an element of an array, or a member of an object, strings have the same syntax as
they do in JSON. But when used as the name of an object member, strings carry a special
postfix tag which acts as a type annotation/signature for the value:
</p>
<div class="syntax-example">
{"hello-string<span class="tag-prefix">:s</span>": "I'm a string!"}
</div>
<p>
Note that a posfix tag is <em>mandatory</em> for all object member names in TJSON and
prevents any ambiguities between tagged and untagged strings. Parsers which encounter
untagged names for object members should raise an exception.
</p>
<p>
Unlike JSON, TJSON strings <em>MUST</em> be encoded as
<a href="https://en.wikipedia.org/wiki/UTF-8">UTF-8</a>.
Other Unicode encodings (e.g. UCS-2 as seen in JavaScript) are expressly disallowed.
All TJSON documents should be valid UTF-8, and parsers should reject documents that
fail to decode as UTF-8.
</p>
</div>
<div class="subsection">
<h2 class="page-header">Binary Data</h2>
<p>
TJSON supports multiple different formats for encoding 8-bit clean binary data.
Conforming encoders/decoders are required to support them all. The default is
<strong>base64url</strong>,
however encoders may be configured with alternative, potentially more visually
appealing or well-recognized encodings for specific fields.
</p>
<h3>Hexadecimal Data (a.k.a. Base16)</h3>
<p>
Data tagged as "d16" is encoded in lower-case hexadecimal format:
</p>
<div class="syntax-example">
{"hello-base-sixteen<span class="tag-prefix">:d16</span>": "48656c6c6f2c20776f726c6421"}
</div>
<p>
TJSON parsers should expressly reject the use of any upper case hexadecimal characters
and fail with an exception.
</p>
<h3>Base32</h3>
<p>
Data tagged as "d32" is encoded in "base32" format as specified in
<a href="https://tools.ietf.org/html/rfc4648">RFC 4648</a>:
</p>
<div class="syntax-example">
{"hello-base-thirty-two<span class="tag-prefix">:d32</span>": "jbswy3dpfqqho33snrscc"}
</div>
<p>
The encoded data should <em>NOT</em> be padded with "<b>=</b>" characters as it's stored
within a quote-delimited string so its length is known in advance.
</p>
<p>
TJSON parsers should expressly reject the use of any upper case Base32 characters
and fail with an exception.
</p>
<h3>Base64url</h3>
<p>
Data tagged "d64" is encoded in in "base64url" format as specified in
<a href="https://tools.ietf.org/html/rfc4648">RFC 4648</a>:
</p>
<div class="syntax-example">
{"hello-base-sixty-four-url<span class="tag-prefix">:d64</span>": "SGVsbG8sIHdvcmxkIQ"}
</div>
<p>
The encoded data should <em>NOT</em> be padded with "<b>=</b>" characters as it's stored
within a quote-delimited string so its length is known in advance.
</p>
<p>
The non-URL safe variant of Base64 is not supported by TJSON and should be rejected by
parsers (i.e. if it contains the "<b>+</b>" or "<b>/</b>" characters it should be
rejected)
</p>
<p>
Because "base64url" is the default encoding for TJSON, the shorthand "d" variant
<em>SHOULD</em> be used by default unless another format is specified:
</p>
<div class="syntax-example">
{"base-sixty-four-is-default<span class="tag-prefix">:d</span>": "SGVsbG8sIHdvcmxkIQ"}
</div>
</div>
<div class="subsection">
<h2 class="page-header">Numbers</h2>
<p>
TJSON supports both integers and floating point numbers in separate formats that can
always be disambiguated.
</p>
<h3>Integers</h3>
<p>
In TJSON, integers are stored as strings, sidestepping integer precision issues with
JSON parsers that do floating point conversions.
</p>
<p>
The following is an example of a <strong>signed integer</strong>, which may be any
value in the range <em>-(2**63)</em> to <em>(2**63)-1</em>.
</p>
<div class="syntax-example">
{"hello-signed-int<span class="tag-prefix">:i</span>": "42"}
</div>
<p>
The following is an example of an <strong>unsigned integer</strong>, which may be
any value in the range <em>0</em> to <em>(2**64)-1</em>:
</p>
<div class="syntax-example">
{"hello-unsigned-int<span class="tag-prefix">:u</span>": "18446744073709551615"}
</div>
<p>
Integers otherwise utilize the <em>int</em> syntax as described in the JSON specification.
</p>
<h3>Floating Points</h3>
<p>
Floating points use the native number literal syntax provided by JSON. Unlike integers,
TJSON floats must not be quoted:
</p>
<div class="syntax-example">
{"hello-float<span class="tag-prefix">:f</span>": 0.42}
</div>
<p>
The full <a href="https://en.wikipedia.org/wiki/IEEE_floating_point">IEEE 754</a>
64-bit floating point range is supported.
</p>
</div>
<div class="subsection">
<h2 class="page-header">Boolean Values</h2>
<p>
TJSON supports the <span class="inline-syntax">true</span> and
<span class="inline-syntax">false</span> values from JSON:
</p>
<div class="syntax-example">
{"hello-true<span class="tag-prefix">:b</span>": true,
"hello-false<span class="tag-prefix">:b</span>": false}
</div>
<p>
The <span class="inline-syntax">null</span> value is expressly
disallowed anywhere inside of a TJSON document.
</p>
</div>
<div class="subsection">
<h2 class="page-header">Timestamp</h2>
<p>
TJSON adds a literal syntax for timestamp values. The format is based on
<a href="https://www.ietf.org/rfc/rfc3339.txt">RFC 3339</a>, however the
use of the UTC time zone identifier "<b>Z</b>" is mandatory (i.e. all
timestamps are Z-normalized):
</p>
<div class="syntax-example">
{"hello-timestamp<span class="tag-prefix">:t</span>": "2016-10-02T07:31:51Z"}
</div>
<p>
TJSON parsers should expressly reject the use of other time zone identifiers
and fail with an exception.
</p>
</div>
</div>
</div>
</div>