summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLuke Shumaker <lukeshu@lukeshu.com>2023-01-25 21:05:17 -0700
committerLuke Shumaker <lukeshu@lukeshu.com>2023-01-26 00:45:27 -0700
commitffee5c8516f3f55f82ed5bb8f0a4f340d485fa92 (patch)
tree0c10526b1ea57b043230402e9378b341c6966965
parent4148776399cb7ea5e10c74dc465e4e1e682cb399 (diff)
Write documentationv0.2.0
-rw-r--r--README.md170
-rw-r--r--compat/json/README.md60
-rw-r--r--decode.go115
-rw-r--r--encode.go30
-rw-r--r--errors.go68
-rw-r--r--internal/parse.go13
-rw-r--r--misc.go47
-rw-r--r--reencode.go30
8 files changed, 486 insertions, 47 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..c8e05ab
--- /dev/null
+++ b/README.md
@@ -0,0 +1,170 @@
+<!--
+Copyright (C) 2023 Luke Shumaker <lukeshu@lukeshu.com>
+
+SPDX-License-Identifier: GPL-2.0-or-later
+-->
+
+# lowmemjson
+
+`lowmemjson` is a mostly-compatible alternative to the standard
+library's [`encoding/json`][] that has dramatically lower memory
+requirements for large data structures.
+
+`lowmemjson` is not targeting extremely resource-constrained
+environments, but rather targets being able to efficiently stream
+gigabytes of JSON without requiring gigabytes of memory overhead.
+
+## Compatibility
+
+`encoding/json`'s APIs are designed around the idea that it can buffer
+the entire JSON document as a `[]byte`, and as intermediate steps it
+may have a fragment buffered multiple times while encoding; encoding a
+gigabyte of data may consume several gigabytes of memory. In
+contrast, `lowmemjson`'s APIs are designed around streaming
+(`io.Writer` and `io.RuneScanner`), trying to have the memory overhead
+of encode and decode operations be as close to O(1) as possible.
+
+`lowmemjson` offers a high level of compatibility with the
+`encoding/json` APIs, but for best memory usage (avoiding storing
+large byte arrays inherent in `encoding/json`'s API), it is
+recommended to migrate to `lowmemjson`'s own APIs.
+
+### Callee API (objects to be encoded-to/decoded-from JSON)
+
+`lowmemjson` supports `encoding/json`'s `json:` struct field tags, as
+well as the `encoding/json.Marshaler` and `encoding/json.Unmarshaler`
+interfaces; you do not need to adjust your types to successfully
+migrate from `encoding/json` to `lowmemjson`.
+
+That is: Given types that decode as desired with `encoding/json`,
+those types should decode identically with `lowmemjson`. Given types
+that encode as desired with `encoding/json`, those types should encode
+identically with `lowmemjson` (assuming an appropriately configured
+`ReEncoder` to match the whitespace-handling and special-character
+escaping; a `ReEncoder` with `Compact=true` and all other settings
+left as zero will match the behavior of `json.Marshal`).
+
+For better memory usage:
+ - Instead of implementing [`json.Marshaler`][], consider implementing
+ [`lowmemjson.Encodable`][] (or implementing both).
+ - Instead of implementing [`json.Unmarshaler`][], consider
+ implementing [`lowmemjson.Decodable`][] (or implementing both).
+
+### Caller API
+
+`lowmemjson` offers a [`lowmemjson/compat/json`][] package that is a
+(mostly) drop-in replacement for `encoding/json` (see the package's
+documentation for the small incompatibilities).
+
+For better memory usage, avoid using `lowmemjson/compat/json` and
+instead use `lowmemjson` directly:
+ - Instead of using <code>[json.Marshal][`json.Marshal`](val)</code>,
+ consider using
+ <code>[lowmemjson.NewEncoder][`lowmemjson.NewEncoder`](w).[Encode][`lowmemjson.Encoder.Encode`](val)</code>.
+ - Instead of using
+ <code>[json.Unmarshal][`json.Unmarshal`](dat, &val)</code>, consider
+ using
+ <code>[lowmemjson.NewDecoder][`lowmemjson.NewDecoder`](r).[DecodeThenEOF][`lowmemjson.Decoder.DecodeThenEOF`](&val)</code>.
+ - Instead of using [`json.Compact`][], [`json.HTMLEscape`][], or
+ [`json.Indent`][]; consider using a [`lowmemjson.ReEncoder`][].
+ - Instead of using [`json.Valid`][], consider using a
+ [`lowmemjson.ReEncoder`][] with `io.Discard` as the output.
+
+The error types returned from `lowmemjson` are different from the
+error types returned by `encoding/json`, but `lowmemjson/compat/json`
+translates them back to the types returned by `encoding/json`.
+
+## Overview
+
+### Caller API
+
+There are 3 main types that make up the caller API for producing and
+handling streams of JSON, and each of those types has some associated
+types that go with it:
+
+ 1. `type Decoder`
+ + `type DecodeArgumentError`
+ + `type DecodeError`
+ * `type DecodeReadError`
+ * `type DecodeSyntaxError`
+ * `type DecodeTypeError`
+
+ 2. `type Encoder`
+ + `type EncodeTypeError`
+ + `type EncodeValueError`
+ + `type EncodeMethodError`
+
+ 3. `type ReEncoder`
+ + `type ReEncodeSyntaxError`
+ + `type BackslashEscaper`
+ * `type BackslashEscapeMode`
+
+A `*Decoder` handles decoding a JSON stream into Go values; the most
+common use of it will be
+`lowmemjson.NewDecoder(r).DecodeThenEOF(&val)` or
+`lowmemjson.NewDecoder(bufio.NewReader(r)).DecodeThenEOF(&val)`.
+
+A `*ReEncoder` handles transforming a JSON stream; this is useful for
+prettifying, minifying, sanitizing, and/or validating JSON. A
+`*ReEncoder` wraps an `io.Writer`, itself implementing `io.Writer`.
+The most common use of it will be something along the lines of
+
+```go
+out = &ReEncoder{
+ Out: out,
+ // settings here
+}
+```
+
+An `*Encoder` handles encoding Go values into a JSON stream.
+`*Encoder` doesn't take much care in to making its output nice; so it
+is usually desirable to have the output stream of an `*Encoder` be a `*ReEncoder`; the most
+common use of it will be
+
+```go
+lowmemjson.NewEncoder(&lowmemjson.ReEncoder{
+ Out: out,
+ // settings here
+}).Encode(val)
+```
+
+### Callee API
+
+For defining Go types with custom JSON representations, `lowmemjson`
+respects all of the `json:` struct field tags of `encoding/json`, as
+well as respecting the same "marshaler" and "unmarshaler" interfaces
+as `encoding/json`. In addition to those interfaces, `lowmemjson`
+adds two of its own interfaces, and some helper functions to help with
+implementing those interfaces:
+
+ 1. `type Decodable`
+ + `func DecodeArray`
+ + `func DecodeObject`
+ 2. `type Encodable`
+
+These are streaming variants of the standard `json.Unmarshaler` and
+`json.Marshaler` interfaces.
+
+<!-- packages -->
+[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson
+[`lowmemjson/compat/json`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson/compat/json
+[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18
+
+<!-- encoding/json symbols -->
+[`json.Marshaler`]: https://pkg.go.dev/encoding/json@go1.18#Marshaler
+[`json.Unmarshaler`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshaler
+[`json.Marshal`]: https://pkg.go.dev/encoding/json@go1.18#Marshal
+[`json.Unmarshal`]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal
+[`json.Compact`]: https://pkg.go.dev/encoding/json@go1.18#Compact
+[`json.HTMLEscape`]: https://pkg.go.dev/encoding/json@go1.18#HTMLEscape
+[`json.Indent`]: https://pkg.go.dev/encoding/json@go1.18#Indent
+[`json.Valid`]: https://pkg.go.dev/encoding/json@go1.18#Valid
+
+<!-- lowmemjson symbols -->
+[`lowmemjson.Encodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encodable
+[`lowmemjson.Decodable`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decodable
+[`lowmemjson.NewEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewEncoder
+[`lowmemjson.Encoder.Encode`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Encoder.Encode
+[`lowmemjson.NewDecoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#NewDecoder
+[`lowmemjson.Decoder.DecodeThenEOF`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#Decoder.DecodeThenEOF
+[`lowmemjson.ReEncoder`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson#ReEncoder
diff --git a/compat/json/README.md b/compat/json/README.md
new file mode 100644
index 0000000..ec8dbed
--- /dev/null
+++ b/compat/json/README.md
@@ -0,0 +1,60 @@
+<!--
+Copyright (C) 2023 Luke Shumaker <lukeshu@lukeshu.com>
+
+SPDX-License-Identifier: GPL-2.0-or-later
+-->
+
+# lowmemjson/compat/json
+
+`lowmemjson/compat/json` is a wrapper around [`lowmemjson`][] that is
+a (mostly) drop-in replacement for the standard library's
+[`encoding/json`][].
+
+This package does not bother to duplicate `encoding/json`'s
+documentation; you should instead refer to [`encoding/json`'s own
+documentation][`encoding/json`].
+
+## Incompatibilities
+
+### Tokens
+
+Because the `lowmemjson` parser is fundamentally different than the
+`encoding/json` parser and does not have any notion of tokens, the
+token API is not included in `lowmemjson/compat/json`:
+
+ - There is no [`Delim`][] type.
+ - There is no [`Token`][] type.
+ - There is no [`Decoder.Token`][] method.
+
+### Types
+
+When possible, `lowmemjson/compat/json` uses type aliases for the
+`encoding/json` types, but in several cases that is not possible
+(`Encoder`, `Decoder`, `SyntaxError`, `MarshalError`). This means
+that while `lowmemjson/compat/json` is source-compatible with
+`encoding/json`, it may not interoperate with code that also uses
+`encoding/json` and relies on those type identities.
+
+The errors returned by the various functions *are* the same errors as
+returned by `encoding/json` (with the exception that `SyntaxError` and
+`MarshalError` are not type aliases).
+
+### Deprecations
+
+Types that are deprecated in `encoding/json` are not mimiced here:
+
+ - There is no [`InvalidUTF8Error`][] type, as it has been depricated
+ since Go 1.2.
+ - There is no [`UnmarshalFieldError`][] type, as it has been
+ depricated since Go 1.1.
+
+<!-- packages -->
+[`lowmemjson`]: https://pkg.go.dev/git.lukeshu.com/go/lowmemjson
+[`encoding/json`]: https://pkg.go.dev/encoding/json@go1.18
+
+<!-- symbols -->
+[`Delim`]: https://pkg.go.dev/encoding/json@go1.18#Delim
+[`Token`]: https://pkg.go.dev/encoding/json@go1.18#Token
+[`Decoder.Token`]: https://pkg.go.dev/encoding/json@go1.18#Decoder.Token
+[`InvalidUTF8Error`]: https://pkg.go.dev/encoding/json@go1.18#InvalidUTF8Error
+[`UnmarshalFieldError`]: https://pkg.go.dev/encoding/json@go1.18#UnmarshalFieldError
diff --git a/decode.go b/decode.go
index 51c1ed5..f911ac3 100644
--- a/decode.go
+++ b/decode.go
@@ -1,6 +1,13 @@
// Copyright (C) 2022-2023 Luke Shumaker <lukeshu@lukeshu.com>
//
// SPDX-License-Identifier: GPL-2.0-or-later
+//
+// Some doc comments are
+// copyright 2010 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+//
+// SPDX-License-Identifier: BSD-3-Clause
package lowmemjson
@@ -19,6 +26,26 @@ import (
"git.lukeshu.com/go/lowmemjson/internal"
)
+// Decodable is the interface implemented by types that can decode a
+// JSON representation of themselves. Decodable is a
+// low-memory-overhead replacement for the json.Unmarshaler interface.
+//
+// The io.RuneScanner passed to DecodeJSON...
+//
+// - ...will return ErrInvalidUnreadRune .UnreadRune if the last
+// operation was not a successful .ReadRune() call.
+//
+// - ...will return EOF at the end of the JSON value; it is not
+// possible for DecodeJSON to read past the end of the value in to
+// another value.
+//
+// - ...if invalid JSON is encountered, will return the invalid rune
+// with err!=nil. Implementations are encouraged to simply
+// `return err` if .ReadRune returns an error.
+//
+// DecodeJSON is expected to consume the entire scanner until io.EOF
+// or another is encountered; if it does not, then the parent Decode
+// call will return a *DecodeTypeError.
type Decodable interface {
DecodeJSON(io.RuneScanner) error
}
@@ -28,6 +55,26 @@ type decodeStackItem struct {
idx any
}
+// A Decoder reads and decodes values from an input stream of JSON
+// elements.
+//
+// Decoder is analogous to, and has a similar API to the standard
+// library's encoding/json.Decoder. Differences are:
+//
+// - lowmemjson.NewDecoder takes an io.RuneScanner, while
+// json.NewDecoder takes an io.Reader.
+//
+// - lowmemjson.Decoder does not have a .Buffered() method, while
+// json.Decoder does.
+//
+// - lowmemjson.Decoder does not have a .Token() method, while
+// json.Decoder does.
+//
+// If something more similar to a json.Decoder is desired,
+// lowmemjson/compat/json.NewDecoder takes an io.Reader (and turns it
+// into an io.RuneScanner by wrapping it in a bufio.Reader), and
+// lowmemjson/compat/json.Decoder has a .Buffered() method; though
+// lowmemjson/compat/json.Decoder also lacks the .Token() method.
type Decoder struct {
io runeTypeScanner
@@ -42,6 +89,11 @@ type Decoder struct {
const maxNestingDepth = 10000
+// NewDecoder returns a new Decoder that reads from r.
+//
+// NewDecoder is analogous to the standard library's
+// encoding/json.NewDecoder, but takes an io.RuneScanner rather than
+// an io.Reader.
func NewDecoder(r io.RuneScanner) *Decoder {
return &Decoder{
io: &noWSRuneTypeScanner{
@@ -55,10 +107,35 @@ func NewDecoder(r io.RuneScanner) *Decoder {
}
}
+// DisallowUnknownFields causes the Decoder to return an error when
+// the destination is a struct and the input contains object keys
+// which do not match any non-ignored, exported fields in the
+// destination.
+//
+// This is identical to the standard library's
+// encoding/json.Decoder.DisallowUnknownFields.
func (dec *Decoder) DisallowUnknownFields() { dec.disallowUnknownFields = true }
-func (dec *Decoder) UseNumber() { dec.useNumber = true }
-func (dec *Decoder) InputOffset() int64 { return dec.io.InputOffset() }
+// UseNumber causes the Decoder to unmarshal a number into an
+// interface{} as a Number instead of as a float64.
+//
+// This is identical to the standard library's
+// encoding/json.Decoder.UseNumber.
+func (dec *Decoder) UseNumber() { dec.useNumber = true }
+
+// InputOffset returns the input stream byte offset of the current
+// decoder position. The offset gives the location of the rune that
+// will be returned from the next call to .ReadRune().
+//
+// This is identical to the standard library's
+// encoding/json.Decoder.InputOffset.
+func (dec *Decoder) InputOffset() int64 { return dec.io.InputOffset() }
+
+// More reports whether there is more to the stream of JSON elements,
+// or if the Decoder has reached EOF or an error.
+//
+// More is identical to the standard library's
+// encoding/json.Decoder.More.
func (dec *Decoder) More() bool {
dec.io.Reset()
_, _, t, e := dec.io.ReadRuneType()
@@ -105,8 +182,10 @@ func (dec *Decoder) stackName() string {
return strings.Join(fields, ".")
}
-// DecodeThenEOF is like decode, but emits an error if there is extra
-// data after the JSON.
+// DecodeThenEOF is like Decode, but emits an error if there is extra
+// data after the JSON. A JSON document is specified to be a single
+// JSON element; repeated calls to Decoder.Decode will happily decode
+// a stream of multiple JSON elements.
func (dec *Decoder) DecodeThenEOF(ptr any) (err error) {
if err := dec.Decode(ptr); err != nil {
return err
@@ -126,6 +205,16 @@ func (dec *Decoder) DecodeThenEOF(ptr any) (err error) {
return nil
}
+// Decode reads the next JSON element from the Decoder's input stream
+// and stores it in the value pointed to by ptr.
+//
+// See the [documentation for encoding/json.Unmarshal] for details
+// about the conversion of JSON into a Go value; Decode behaves
+// identically to that, with the exception that in addition to the
+// json.Unmarshaler interface it also checks for the Decodable
+// interface.
+//
+// [documentation for encoding/json.Unmarshal]: https://pkg.go.dev/encoding/json@go1.18#Unmarshal
func (dec *Decoder) Decode(ptr any) (err error) {
ptrVal := reflect.ValueOf(ptr)
if ptrVal.Kind() != reflect.Pointer || ptrVal.IsNil() || !ptrVal.Elem().CanSet() {
@@ -721,7 +810,14 @@ func (dec *Decoder) decodeAny() any {
}
}
-// DecodeObject is a helper function for implementing the Decoder interface.
+// DecodeObject is a helper function to ease implementing the
+// Decodable interface; allowing the lowmemjson package to handle
+// decoding the object syntax, while the Decodable only needs to
+// handle decoding the keys and values within the object.
+//
+// Outside of implementing Decodable.DecodeJSON methods, callers
+// should instead simply use NewDecoder(r).Decode(&val) rather than
+// attempting to call DecodeObject directly.
func DecodeObject(r io.RuneScanner, decodeKey, decodeVal func(io.RuneScanner) error) (err error) {
defer func() {
if r := recover(); r != nil {
@@ -784,7 +880,14 @@ func (dec *Decoder) decodeObject(gTyp reflect.Type, decodeKey, decodeVal func())
}
}
-// DecodeArray is a helper function for implementing the Decoder interface.
+// DecodeArray is a helper function to ease implementing the Decoder
+// interface; allowing the lowmemjson package to handle decoding the
+// array syntax, while the Decodable only needs to handle decoding
+// members within the array.
+//
+// Outside of implementing Decodable.DecodeJSON methods, callers
+// should instead simply use NewDecoder(r).Decode(&val) rather than
+// attempting to call DecodeArray directly.
func DecodeArray(r io.RuneScanner, decodeMember func(r io.RuneScanner) error) (err error) {
defer func() {
if r := recover(); r != nil {
diff --git a/encode.go b/encode.go
index 6963e3c..d31f36e 100644
--- a/encode.go
+++ b/encode.go
@@ -21,6 +21,12 @@ import (
"unsafe"
)
+// Encodable is the interface implemented by types that can encode
+// themselves to JSON. Encodable is a low-memory-overhead replacement
+// for the json.Marshaler interface.
+//
+// The io.Writer passed to EncodeJSON returns an error if invalid JSON
+// is written to it.
type Encodable interface {
EncodeJSON(w io.Writer) error
}
@@ -41,6 +47,15 @@ func encodeWriteString(w io.Writer, str string) {
}
}
+// An Encoder encodes and writes values to a stream of JSON elements.
+//
+// Encoder is analogous to, and has a similar API to the standar
+// library's encoding/json.Encoder. Differences are that rather than
+// having .SetEscapeHTML and .SetIndent methods, the io.Writer passed
+// to it may be a *ReEncoder that has these settings (and more). If
+// something more similar to a json.Encoder is desired,
+// lowmemjson/compat/json.Encoder offers those .SetEscapeHTML and
+// .SetIndent methods.
type Encoder struct {
w *ReEncoder
closeAfterEncode bool
@@ -65,6 +80,15 @@ func NewEncoder(w io.Writer) *Encoder {
}
}
+// Encode encodes obj to JSON and writes that JSON to the Encoder's
+// output stream.
+//
+// See the [documentation for encoding/json.Marshal] for details about
+// the conversion Go values to JSON; Encode behaves identically to
+// that, with the exception that in addition to the json.Marshaler
+// interface it also checks for the Encodable interface.
+//
+// [documentation for encoding/json.Marshal]: https://pkg.go.dev/encoding/json@go1.18#Marshal
func (enc *Encoder) Encode(obj any) (err error) {
defer func() {
if r := recover(); r != nil {
@@ -115,8 +139,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool
if err := obj.EncodeJSON(validator); err != nil {
panic(encodeError{&EncodeMethodError{
Type: val.Type(),
- Err: err,
SourceFunc: "EncodeJSON",
+ Err: err,
}})
}
if err := validator.Close(); err != nil && !errors.Is(err, iofs.ErrClosed) {
@@ -140,8 +164,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool
if err != nil {
panic(encodeError{&EncodeMethodError{
Type: val.Type(),
- Err: err,
SourceFunc: "MarshalJSON",
+ Err: err,
}})
}
// Use a sub-ReEncoder to check that it's a full element.
@@ -170,8 +194,8 @@ func encode(w io.Writer, val reflect.Value, escaper BackslashEscaper, quote bool
if err != nil {
panic(encodeError{&EncodeMethodError{
Type: val.Type(),
- Err: err,
SourceFunc: "MarshalText",
+ Err: err,
}})
}
encodeStringFromBytes(w, escaper, text)
diff --git a/errors.go b/errors.go
index 67fe6c9..5669d36 100644
--- a/errors.go
+++ b/errors.go
@@ -1,4 +1,4 @@
-// Copyright (C) 2022 Luke Shumaker <lukeshu@lukeshu.com>
+// Copyright (C) 2022-2023 Luke Shumaker <lukeshu@lukeshu.com>
//
// SPDX-License-Identifier: GPL-2.0-or-later
@@ -14,21 +14,23 @@ import (
"git.lukeshu.com/go/lowmemjson/internal"
)
-var (
- ErrInvalidUnreadRune = errors.New("lowmemjson: invalid use of UnreadRune")
-)
+// ErrInvalidUnreadRune is returned to Decodable.DecodeJSON(scanner)
+// implementations from scanner.UnreadRune() if the last operation was
+// not a successful .ReadRune() call.
+var ErrInvalidUnreadRune = errors.New("lowmemjson: invalid use of UnreadRune")
// parser errors ///////////////////////////////////////////////////////////////////////////////////
-var (
- ErrParserExceededMaxDepth = internal.ErrParserExceededMaxDepth
-)
+// ErrParserExceededMaxDepth is the base error that a
+// *DecodeSyntaxError wraps when the depth of the JSON document
+// exceeds 10000.
+var ErrParserExceededMaxDepth = internal.ErrParserExceededMaxDepth
// low-level decode errors /////////////////////////////////////////////////////////////////////////
// These will be wrapped in a *DecodeError.
-// A *DecodeReadError is returned from Decode if there is an I/O error
-// reading the input.
+// A *DecodeReadError is returned from Decode (wrapped in a
+// *DecodeError) if there is an I/O error reading the input.
type DecodeReadError struct {
Err error
Offset int64
@@ -39,8 +41,8 @@ func (e *DecodeReadError) Error() string {
}
func (e *DecodeReadError) Unwrap() error { return e.Err }
-// A *DecodeSyntaxError is returned from Decode if there is a syntax
-// error in the input.
+// A *DecodeSyntaxError is returned from Decode (wrapped in a
+// *DecodeError) if there is a syntax error in the input.
type DecodeSyntaxError struct {
Err error
Offset int64
@@ -51,8 +53,9 @@ func (e *DecodeSyntaxError) Error() string {
}
func (e *DecodeSyntaxError) Unwrap() error { return e.Err }
-// A *DecodeTypeError is returned from Decode if the JSON input is not
-// appropriate for the given Go type.
+// A *DecodeTypeError is returned from Decode (wrapped in a
+// *DecodeError) if the JSON input is not appropriate for the given Go
+// type.
//
// If a .DecodeJSON, .UnmarshalJSON, or .UnmashaleText method returns
// an error, it is wrapped in a *DecodeTypeError.
@@ -69,7 +72,7 @@ func (e *DecodeTypeError) Error() string {
if e.JSONType != "" {
fmt.Fprintf(&buf, "JSON %s ", e.JSONType)
}
- fmt.Fprintf(&buf, "at input byte %v in to Go %v", e.Offset, e.GoType)
+ fmt.Fprintf(&buf, "at input byte %v into Go %v", e.Offset, e.GoType)
if e.Err != nil {
fmt.Fprintf(&buf, ": %v", strings.TrimPrefix(e.Err.Error(), "json: "))
}
@@ -78,9 +81,10 @@ func (e *DecodeTypeError) Error() string {
func (e *DecodeTypeError) Unwrap() error { return e.Err }
-var (
- ErrDecodeNonEmptyInterface = errors.New("cannot decode in to non-empty interface")
-)
+// ErrDecodeNonEmptyInterface is the base error that a
+// *DecodeTypeError wraps when Decode is asked to unmarshal into an
+// `interface` type that has one or more methods.
+var ErrDecodeNonEmptyInterface = errors.New("cannot decode into non-empty interface")
// high-level decode errors ////////////////////////////////////////////////////////////////////////
@@ -88,21 +92,26 @@ var (
// not a non-nil pointer or is not settable.
//
// Alternatively, a *DecodeArgument error may be found inside of a
-// *DecodeTypeError if the type being decoded in to is not a type that
-// can be decoded in to (such as map with non-stringable type as
-// keys).
+// *DecodeTypeError if the type being decoded into is not a type that
+// can be decoded into (such as map with non-stringable type as keys).
//
// type DecodeArgumentError struct {
// Type reflect.Type
// }
type DecodeArgumentError = json.InvalidUnmarshalError
+// A *DecodeError is returned from Decode for all errors except for
+// *DecodeArgumentError.
+//
+// A *DecodeError wraps *DecodeSyntaxError for malformed or illegal
+// input, *DecodeTypeError for Go type issues, or *DecodeReadError for
+// I/O errors.
type DecodeError struct {
- Field string
- Err error
+ Field string // Where in the JSON the error was, in the form "v[idx][idx][idx]".
+ Err error // What the error was.
- FieldParent string // for compat
- FieldName string // for compat
+ FieldParent string // for compat; the same as encoding/json.UnmarshalTypeError.Struct
+ FieldName string // for compat; the same as encoding/json.UnmarshalTypeError.Field
}
func (e *DecodeError) Error() string {
@@ -129,19 +138,18 @@ type EncodeTypeError = json.UnsupportedTypeError
// }
type EncodeValueError = json.UnsupportedValueError
-// An *EncodeTypeError is returned by Encode when attempting to encode
-// an unsupported value type.
+// An *EncodeMethodError wraps an error that is returned from an
+// object's method when encoding that object to JSON.
type EncodeMethodError struct {
- Type reflect.Type
- Err error
- SourceFunc string
+ Type reflect.Type // The Go type that the method is on
+ SourceFunc string // The method: "EncodeJSON", "MarshalJSON", or "MarshalText"
+ Err error // The error that the method returned
}
func (e *EncodeMethodError) Error() string {
return fmt.Sprintf("json: error calling %v for type %v: %v",
e.SourceFunc, e.Type, strings.TrimPrefix(e.Err.Error(), "json: "))
}
-
func (e *EncodeMethodError) Unwrap() error { return e.Err }
// reencode errors /////////////////////////////////////////////////////////////////////////////////
diff --git a/internal/parse.go b/internal/parse.go
index 12d7600..895c930 100644
--- a/internal/parse.go
+++ b/internal/parse.go
@@ -14,10 +14,13 @@ import (
var ErrParserExceededMaxDepth = errors.New("exceeded max depth")
+// RuneType is the classification of a rune when parsing JSON input.
+// A Parser, rather than grouping runes into tokens and classifying
+// tokens, classifies runes directly.
type RuneType uint8
const (
- RuneTypeError = RuneType(iota)
+ RuneTypeError RuneType = iota
RuneTypeSpace // whitespace
@@ -42,7 +45,7 @@ const (
RuneTypeStringEnd // closing '"'
RuneTypeNumberIntNeg
- RuneTypeNumberIntZero
+ RuneTypeNumberIntZero // leading zero only; non-leading zeros are IntDig, not IntZero
RuneTypeNumberIntDig
RuneTypeNumberFracDot
RuneTypeNumberFracDig
@@ -69,6 +72,7 @@ const (
RuneTypeEOF
)
+// GoString implements fmt.GoStringer.
func (t RuneType) GoString() string {
str, ok := map[RuneType]string{
RuneTypeError: "RuneTypeError",
@@ -128,6 +132,7 @@ func (t RuneType) GoString() string {
return fmt.Sprintf("RuneType(%d)", t)
}
+// String implements fmt.Stringer.
func (t RuneType) String() string {
str, ok := map[RuneType]string{
RuneTypeError: "x",
@@ -202,10 +207,14 @@ func (t RuneType) JSONType() string {
}[t]
}
+// IsNumber returns whether the RuneType is one of the
+// RuneTypeNumberXXX values.
func (t RuneType) IsNumber() bool {
return RuneTypeNumberIntNeg <= t && t <= RuneTypeNumberExpDig
}
+// Parser is the low-level JSON parser that powers both *Decoder and
+// *ReEncoder.
type Parser struct {
// Setting MaxError to a value greater than 0 causes
// HandleRune to return ErrParserExceededMaxDepth if
diff --git a/misc.go b/misc.go
index 4f8e55e..92757f4 100644
--- a/misc.go
+++ b/misc.go
@@ -44,25 +44,43 @@ func writeRune(w io.Writer, c rune) (int, error) {
// JSON string encoding ////////////////////////////////////////////////////////
+// BackSlashEscapeMode identifies one of the three ways that a
+// character may be represented in a JSON string:
+//
+// - literally (no backslash escaping)
+//
+// - as a short "well-known" `\X` backslash sequence (where `X` is a
+// single-character)
+//
+// - as a long Unicode `\uXXXX` backslash sequence
type BackslashEscapeMode uint8
const (
- BackslashEscapeNone = BackslashEscapeMode(iota)
+ BackslashEscapeNone BackslashEscapeMode = iota
BackslashEscapeShort
BackslashEscapeUnicode
)
+// A BackslashEscaper controls how a ReEncoder emits a character in a
+// JSON string. The `rune` argument is the character being
+// considered, and the `BackslashEscapeMode` argument is how it was
+// originally encoded in the input.
type BackslashEscaper = func(rune, BackslashEscapeMode) BackslashEscapeMode
+// EscapePreserve is a BackslashEscaper that preserves the original
+// input escaping.
func EscapePreserve(_ rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
return wasEscaped
}
+// EscapeJSSafe is a BackslashEscaper that escapes strings such that
+// the JSON safe to embed in JS; it otherwise preserves the original
+// input escaping.
+//
+// JSON is notionally a JS subset, but that's not actually true; so
+// more conservative backslash-escaping is necessary to safely embed
+// it in JS. http://timelessrepo.com/json-isnt-a-javascript-subset
func EscapeJSSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
- // JSON is notionally a JS subset, but that's not actually
- // true.
- //
- // http://timelessrepo.com/json-isnt-a-javascript-subset
switch c {
case '\u2028', '\u2029':
return BackslashEscapeUnicode
@@ -71,6 +89,9 @@ func EscapeJSSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
}
}
+// EscapeHTMLSafe is a BackslashEscaper that escapes strings such that
+// the JSON is safe to embed in HTML; it otherwise preserves the
+// original input escaping.
func EscapeHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
switch c {
case '&', '<', '>':
@@ -80,6 +101,15 @@ func EscapeHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode
}
}
+// EscapeDefault is a BackslashEscaper that mimics the default
+// behavior of encoding/json.
+//
+// It is like EscapeHTMLSafe, but also uses long Unicode `\uXXXX`
+// sequences for `\b`, `\f`, and the `\uFFFD` Unicode replacement
+// character.
+//
+// A ReEncoder uses EscapeDefault if a BackslashEscaper is not
+// specified.
func EscapeDefault(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
switch c {
case '\b', '\f', utf8.RuneError:
@@ -89,6 +119,13 @@ func EscapeDefault(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
}
}
+// EscapeDefault is a BackslashEscaper that mimics the default
+// behavior of an encoding/json.Encoder that has had
+// SetEscapeHTML(false) called on it.
+//
+// It is like EscapeJSSafe, but also uses long Unicode `\uXXXX`
+// sequences for `\b`, `\f`, and the `\uFFFD` Unicode replacement
+// character.
func EscapeDefaultNonHTMLSafe(c rune, wasEscaped BackslashEscapeMode) BackslashEscapeMode {
switch c {
case '\b', '\f', utf8.RuneError:
diff --git a/reencode.go b/reencode.go
index 34c3851..b20a503 100644
--- a/reencode.go
+++ b/reencode.go
@@ -20,10 +20,21 @@ type speculation struct {
indentBuf bytes.Buffer
}
+// A ReEncoder takes a stream of JSON elements (by way of implementing
+// io.Writer and WriteRune), and re-encodes the JSON, writing it to
+// the .Out member.
+//
+// This is useful for prettifying, minifying, sanitizing, and/or
+// validating JSON.
+//
// The memory use of a ReEncoder is O( (CompactIfUnder+1)^2 + depth).
type ReEncoder struct {
+ // The output stream to write the re-encoded JSON to.
Out io.Writer
+ // A JSON document is specified to be a single JSON element;
+ // but it is often desirable to handle streams of multiple
+ // JSON elements.
AllowMultipleValues bool
// Whether to minify the JSON.
@@ -88,6 +99,14 @@ type ReEncoder struct {
// public API //////////////////////////////////////////////////////////////////
+// Write implements io.Writer; it does what you'd expect, mostly.
+//
+// Rather than returning the number of bytes written to the output
+// stream, it returns the nubmer of bytes from p that it successfully
+// handled. This distinction is because *ReEncoder transforms the
+// data written to it, and the number of bytes written may be wildly
+// different than the number of bytes handled; and that would break
+// virtually all users of io.Writer.
func (enc *ReEncoder) Write(p []byte) (int, error) {
if len(p) == 0 {
return 0, nil
@@ -113,7 +132,7 @@ func (enc *ReEncoder) Write(p []byte) (int, error) {
return len(p), nil
}
-// Close does what you'd expect, mostly.
+// Close implements io.Closer; it does what you'd expect, mostly.
//
// The *ReEncoder may continue to be written to with new JSON values
// if enc.AllowMultipleValues is set.
@@ -144,6 +163,15 @@ func (enc *ReEncoder) Close() error {
return nil
}
+// WriteRune write a single Unicode code point, returning the number
+// of bytes written to the output stream and any error.
+//
+// Even when there is no error, the number of bytes written may be
+// zero (for example, when the rune is whitespace and the ReEncoder is
+// minifying the JSON), or it may be substantially longer than one
+// code point's worth (for example, when `\uXXXX` escaping a character
+// in a string, or when outputing extra whitespace when the ReEncoder
+// is prettifying the JSON).
func (enc *ReEncoder) WriteRune(c rune) (n int, err error) {
if enc.err != nil {
return 0, enc.err