Reduce how much code is generated #745

Marwes · 2021-01-12T09:36:22Z

An attempt to replicate the wins in #687 without using unsafe or losing any performance. To achieve this, all commonly duplicated methods have been extracted to less generic methods (only generic on R) which parse up to the visit method and returns an enum to indicate how to proceed. Specifically these enums hold the Error themselves instead of being wrapped in a `Result´ since that helps codegen slightly.

I unfortunately only have access to a laptop prone to throttling during benchmarking atm, so I don't have reliable measurements but this does seem to give no difference in performance or a couple of percent slowdown so I'd appreciate if someone could attempt to run this independently. (To get some more precise measurements in json-benchmark I hacked in criterion which can run with this command cargo run --release --features lib-serde,all-files,parse-struct,parse -dom,serde_json,criterion --no-default-features -- --bench)

cargo llvm-lines  --bin json-benchmark --no-default-features --features lib-serde,file-twitter,performance  | head -30

Before

  Lines          Copies       Function name
  -----          ------       -------------
  111368 (100%)  1640 (100%)  (TOTAL)
   13186 (11.8%)   43 (2.6%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
    9397 (8.4%)    15 (0.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
    5430 (4.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    4267 (3.8%)    15 (0.9%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
    3939 (3.5%)    39 (2.4%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
    3262 (2.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    3151 (2.8%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    2722 (2.4%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
    2353 (2.1%)    38 (2.3%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
    2162 (1.9%)   119 (7.3%)  core::ptr::drop_in_place
    1947 (1.7%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    1888 (1.7%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    1864 (1.7%)    15 (0.9%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any
    1789 (1.6%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_str
    1626 (1.5%)     6 (0.4%)  serde_json::de::Deserializer<R>::deserialize_number
    1557 (1.4%)    39 (2.4%)  serde::ser::SerializeMap::serialize_entry
    1422 (1.3%)     6 (0.4%)  serde::ser::Serializer::collect_seq
    1330 (1.2%)    25 (1.5%)  core::result::Result<T,E>::map
    1321 (1.2%)    10 (0.6%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_option
    1291 (1.2%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::SearchMetadata>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    1013 (0.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
     988 (0.9%)    28 (1.7%)  <serde::private::de::missing_field::MissingFieldDeserializer<E> as serde::de::Deserializer>::deserialize_any
     946 (0.8%)    38 (2.3%)  serde::private::de::missing_field
     910 (0.8%)     5 (0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
     867 (0.8%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::StatusEntities>::deserialize::__Visitor as serde::de::Visitor>::visit_map
     854 (0.8%)     1 (0.1%)  json_benchmark::copy::twitter::_::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize
     817 (0.7%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::UserMention>::deserialize::__Visitor as serde::de::Visitor>::visit_map

After

  Lines         Copies       Function name
  -----         ------       -------------
  90777 (100%)  1617 (100%)  (TOTAL)
   5430 (6.0%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   5118 (5.6%)    43 (2.7%)  <serde_json::de::SeqAccess<R> as serde::de::SeqAccess>::next_element_seed
   4885 (5.4%)    15 (0.9%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_struct
   3262 (3.6%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   3151 (3.5%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
   2353 (2.6%)    38 (2.4%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_value_seed
   2301 (2.5%)    39 (2.4%)  <serde_json::ser::Compound<W,F> as serde::ser::SerializeMap>::serialize_value
   2183 (2.4%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_seq
   2162 (2.4%)   119 (7.4%)  core::ptr::drop_in_place
   1947 (2.1%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
   1888 (2.1%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1785 (2.0%)    15 (0.9%)  <serde_json::de::MapAccess<R> as serde::de::MapAccess>::next_key_seed
   1744 (1.9%)    15 (0.9%)  <serde_json::de::MapKey<R> as serde::de::Deserializer>::deserialize_any
   1557 (1.7%)    39 (2.4%)  serde::ser::SerializeMap::serialize_entry
   1422 (1.6%)     6 (0.4%)  serde::ser::Serializer::collect_seq
   1299 (1.4%)     7 (0.4%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_str
   1291 (1.4%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::SearchMetadata>::deserialize::__Visitor as serde::de::Visitor>::visit_map
   1013 (1.1%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Media>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
    988 (1.1%)    28 (1.7%)  <serde::private::de::missing_field::MissingFieldDeserializer<E> as serde::de::Deserializer>::deserialize_any
    946 (1.0%)    38 (2.4%)  serde::private::de::missing_field
    910 (1.0%)     5 (0.3%)  alloc::raw_vec::RawVec<T,A>::grow_amortized
    894 (1.0%)     6 (0.4%)  serde_json::de::Deserializer<R>::deserialize_number
    867 (1.0%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::StatusEntities>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    854 (0.9%)     1 (0.1%)  json_benchmark::copy::twitter::_::<impl serde::ser::Serialize for json_benchmark::copy::twitter::User>::serialize
    817 (0.9%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::UserMention>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    769 (0.8%)     1 (0.1%)  <json_benchmark::copy::twitter::_::<impl serde::de::Deserialize for json_benchmark::copy::twitter::Url>::deserialize::__Visitor as serde::de::Visitor>::visit_map
    741 (0.8%)    10 (0.6%)  <&mut serde_json::de::Deserializer<R> as serde::de::Deserializer>::deserialize_option

Marwes · 2022-04-07T14:10:49Z

@dtolnay Rebased and re-ran my criterion hack for "json-benchmark", there is some variance from run to run but it seems like this may actually improve performance (at least improvements seem more common and larger than any regressions in the variance). Any chance this can get merged?

Gnuplot not found, using plotters backend
parse-dom/data/canada.json
                        time:   [7.1562 ms 7.1770 ms 7.1991 ms]
                        thrpt:  [298.20 MiB/s 299.12 MiB/s 299.99 MiB/s]
                 change:
                        time:   [-0.0684% +0.3025% +0.7077%] (p = 0.11 > 0.05)
                        thrpt:  [-0.7027% -0.3016% +0.0685%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

parse-struct/data/canada.json
                        time:   [3.3351 ms 3.3442 ms 3.3533 ms]
                        thrpt:  [640.19 MiB/s 641.94 MiB/s 643.69 MiB/s]
                 change:
                        time:   [-15.564% -15.263% -14.974%] (p = 0.00 < 0.05)
                        thrpt:  [+17.611% +18.012% +18.432%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking parse-dom/data/citm_catalog.json: Collecting 100 samples in estimated 5.3243 s (1300 iteratio                                                                                                          parse-dom/data/citm_catalog.json
                        time:   [3.9768 ms 3.9968 ms 4.0209 ms]
                        thrpt:  [409.65 MiB/s 412.12 MiB/s 414.20 MiB/s]
                 change:
                        time:   [+4.1500% +4.8664% +5.7135%] (p = 0.00 < 0.05)
                        thrpt:  [-5.4047% -4.6406% -3.9847%]
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

Benchmarking parse-struct/data/citm_catalog.json: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
Benchmarking parse-struct/data/citm_catalog.json: Collecting 100 samples in estimated 8.2943 s (5050 itera                                                                                                          parse-struct/data/citm_catalog.json
                        time:   [1.5731 ms 1.5777 ms 1.5826 ms]
                        thrpt:  [1.0164 GiB/s 1.0196 GiB/s 1.0226 GiB/s]
                 change:
                        time:   [-2.3439% -1.4440% -0.8393%] (p = 0.00 < 0.05)
                        thrpt:  [+0.8464% +1.4651% +2.4002%]
                        Change within noise threshold.

parse-dom/data/twitter.json
                        time:   [2.0885 ms 2.0933 ms 2.0986 ms]
                        thrpt:  [286.97 MiB/s 287.70 MiB/s 288.37 MiB/s]
                 change:
                        time:   [+0.9558% +1.2591% +1.5778%] (p = 0.00 < 0.05)
                        thrpt:  [-1.5533% -1.2435% -0.9467%]
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  4 (4.00%) low mild
  6 (6.00%) high mild
  8 (8.00%) high severe

parse-struct/data/twitter.json
                        time:   [846.45 us 848.43 us 850.53 us]
                        thrpt:  [708.10 MiB/s 709.85 MiB/s 711.52 MiB/s]
                 change:
                        time:   [-10.075% -8.8594% -7.6386%] (p = 0.00 < 0.05)
                        thrpt:  [+8.2703% +9.7205% +11.204%]
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  2 (2.00%) low severe
  5 (5.00%) low mild
  5 (5.00%) high mild
  7 (7.00%) high severe

Noah-Kennedy · 2022-05-06T21:05:20Z

@dtolnay are you able to take a look at this?

indiv0 · 2022-05-15T02:24:20Z

👍 this would be great to have, especially in crates that depend on crates with with lots and lots of Deserialize types.

Should hopefully reduce the amount of code being generated

Avoids generating duplicates of the map_err per key/value type

Marwes · 2022-07-04T14:38:44Z

@dtolnay Are you able to take a look at this? Is it something that might be merged at some point?

Elabajaba · 2022-08-04T02:28:53Z

This reduces the overall compile time of the gltf crate by about ~50-55% in my testing. (and the compilation of the extremely heavy gltf-json part where all the serde stuff lives by ~2/3)

Timings with this PR

Timings with the current release version of serde_json

Elabajaba · 2022-08-29T19:01:51Z

I did 5 runs of json benchmark for both the current master branch and this PR on my home Linux server with everything I have running on it disabled (0.00 average load, 3950x CPU, stayed sub 60C the entire time so no thermal throttling, rust 1.63, sccache disabled, rm -rf ./target/release/ && RUSTFLAGS='-C codegen-units=1' cargo run --release --no-default-features --features parse-struct,lib-serde,all-files --timings instead of cargo clean to preserve the cargo-timings).

For this PR, canada.json was ~8% faster, citm_catalog.json was ~6% slower, and twitter.json was basically the same. Build times were slightly faster as well, though it's dominated by the jemalloc-sys build script. The final json-benchmark bin compile time was reduced by about 1s (from ~8s to ~7s).

Averages

Current 44d9c53

data/canada.json:       538 MB/s
data/citm_catalog.json: 1036 MB/s
data/twitter.json:      754 MB/s
build times:            24.264s

This PR 60e4ac2

data/canada.json:       580 MB/s
data/citm_catalog.json: 974 MB/s
data/twitter.json:      748 MB/s
build times:            23.11s

Differences (PR / Current)

data/canada.json:       107.8067%
data/citm_catalog.json: 94.0154%
data/twitter.json:      99.2042%
build times:            95.2440%

Walther · 2023-04-12T09:14:02Z

Kindest little bump - what is the status of this PR?
Is there anything where help would be needed?

Marwes · 2023-04-18T10:41:41Z

The PR has some conflicts now, but nothing that would be difficult to fix. This is still something I'd like to see merged but in the end it is up to the time and interest of the owner(s).

Marwes force-pushed the min_ser branch from e769d7a to 8f622cd Compare April 7, 2022 13:54

indiv0 mentioned this pull request May 15, 2022

Improving Compile Times And/Or Add Pluggable Deserializable Types awslabs/aws-lambda-rust-runtime#481

Closed

Markus Westerlind added 25 commits July 4, 2022 16:32

refactor: Factor out whitespace skipping into helpers

1a739c1

Should hopefully reduce the amount of code being generated

refactor: Add next_char_or_error

b01ff91

Move more non-generic code out of the generic path

fcd4dc1

Move eat_char into the _until function

57dc798

Move even more code out of the generic deserialize_struct

b9540a7

Move recursion checking out of the generic path

3e0d096

Factor out non-generic parts of next_element_seed

b63097f

Factor out non-generic parts of next_key_seed

3c327b0

Use try_with!

4e98512

Move non-generic code out of deserialize_any

e6216e6

s/field/element/

1ca5734

Avoid using map in generic functions

061247b

refactor: Add helpers for serializing begin/end key/object

52259b9

Avoids generating duplicates of the map_err per key/value type

Move out code from the generic string deserialization methods

87d36a5

Shrink deserialize_struct a bit more

322f8c4

refactor: Factor out the duplicated variant serialization

5d5ad07

refactor: add begin/end_string helpers

33ef802

refactor: Avoid a tri! in commonly instantiated code

5cf7a6f

refactor: Various shortenings

cacd1b7

Further extract code out of deserialize_any

6fd698c

refactor: Extract parse_option

9f16c61

Shorten unit deserializing

d28a020

Extract helpers to reduce map_err calls

f77905a

refactor: Remove unnecessary tri! calls

760ccb2

refactor: Extract a less generic part of deserialize_number

9956638

Markus Westerlind added 9 commits July 4, 2022 16:34

refactor: Extract code from deserialize_enum

0f15688

refactor: De-duplicate the recursion checking

60cdfbd

Add an OptionResult enum

4de41fd

inline

9e56187

Extract prefixes for deserialize_seq and map

010eb7c

Reuse Deserialzer::parse_str

2067058

Remove end_map, end_seq

0fa7a8f

Fix check_recursion_prefix

f595267

Fix clippy

60e4ac2

Marwes force-pushed the min_ser branch from 8f622cd to 60e4ac2 Compare July 4, 2022 14:34

Elabajaba mentioned this pull request Aug 4, 2022

Including crate increases build times by an order of magnitude gltf-rs/gltf#342

Open

matklad mentioned this pull request Aug 21, 2023

fix: avoid problematic serde release rust-lang/rust-analyzer#15482

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce how much code is generated #745

Reduce how much code is generated #745

Marwes commented Jan 12, 2021

Marwes commented Apr 7, 2022 •

edited

Loading

Noah-Kennedy commented May 6, 2022

indiv0 commented May 15, 2022

Marwes commented Jul 4, 2022

Elabajaba commented Aug 4, 2022

Elabajaba commented Aug 29, 2022

Walther commented Apr 12, 2023

Marwes commented Apr 18, 2023

Reduce how much code is generated #745

Are you sure you want to change the base?

Reduce how much code is generated #745

Conversation

Marwes commented Jan 12, 2021

Before

After

Marwes commented Apr 7, 2022 • edited Loading

Noah-Kennedy commented May 6, 2022

indiv0 commented May 15, 2022

Marwes commented Jul 4, 2022

Elabajaba commented Aug 4, 2022

Elabajaba commented Aug 29, 2022

Walther commented Apr 12, 2023

Marwes commented Apr 18, 2023

Marwes commented Apr 7, 2022 •

edited

Loading