So many formats, so little bandwidth. See wesm comment before making up your mind.
Serialization suitible for API RPC networked services.
- CSV - Comma Separated Values. Textual.
- JSON - Lightweight document data-interchange format. Textual.
- JSONL - Schemeless "multiple JSON documents in 1 file" container data format. Textual.
- Thrift - Scalable code generation, schema evolution binary format. Binary.
- Protocol Buffers - Google's data interchange format. Binary.
- Message Pack - Efficient JSON-like binary serialization format. Binary.
- bson - Binary schemeless JSON encoding. Binary.
- XML - Extensible Markup Language. Genuinely Horrible. Textual.
- Plist - Property List representation. Apple. Textual.
- YAML - Identation based data serialization standard. Textual.
- TOML - Tom's Obvious, Minimal Language. Textual.
- gRPC - A high-performance, open source universal RPC framework. Binary, ISO Layer 7.
- RSocket - Application protocol providing Reactive Streams semantics. Binary, ISO Layer 5 (or 6).
Serialization suitble for big data at rest systems, from Hadoop family of solutions.
- Parquet - Columnar storage for Hadoop workloads. Binary.
- FlatBuffers - Protocol Buffers suitible for larger datasets. Binary.
- ORC - Columnar storage for Hadoop workloads. Binary.
- Avro - Scheme embedded, dynamic rich data structures. Textual/Binary.
- Ion - Row storage with skip scan parsing. Structed, schema embedded. Amazon. Textual/Binary.
Large scale sparse arrays used in physical, mathematics and statistics research.
- HDF5® - n-dimensional datasets, complex objects, with schema. Efficient I/O. Binary.
- npy - Numpy arrays, cell sparse metadata. Binary.
Serialization of deep learning networks and weights.
- SavedModel - TensorFlow package, weights, graph, executable code. Binary.
- GraphDef - TensorFlow graphs. Binary. Deprecated.
- ONNX - The open standard for machine learning interoperability
- safetensors - Simple, safe way to store and distribute tensors
Serialization formats for representing graph oriented data structures.
- common-workflow-language - Specification for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments.
- Relational Algebra and Datalog for Graphs - Coursera course on graph data manipulation.
Language native serialization formats for in transit data (aka memory based "live" objects)
- Dart Object Serialization - Ram to Disk serialization. Dart specific. Binary.
- pickle - Ram to Disk serialization. Binary.
- msgpack-python - MessagePack serializer implementation for Python.
- srsly - Modern high-performance serialization utilities for Python.
- MessagePack.swift - Swift MessagePack Serializer.
- Java Object Serialization - Ram to Disk serialization. Binary.
Research papers discussing types, category theory, benchamrks and co.
- Type theory - studies types, which informally are attributes that objects can possess.
- Category theory - General theory of functions. Axiomatic foundation for mathematics, as an alternative to set theory.
Contributions welcome! Read the contribution guidelines first.
To the extent possible under law, Maxim Veksler has waived all copyright and related or neighboring rights to this work.