Data Serialization

    Data serialization is the process of converting structured data to a formatthat allows sharing or storage of the data in a form that allows recovery of its originalstructure. In some cases, the secondary intention of dataserialization is to minimize the data’s size which thenreduces disk space or bandwidth requirements.

    Before beginning to serialize data, it is important to identify or decide how thedata should be structured during data serialization - flat or nested.The differences in the two styles are shown in the below examples.

    Flat style:

    Nested style:

    1. {"A"
    2. { "field1": "value1", "field2": "value2", "field3": "value3" } }

    For more reading on the two styles, please see the discussion onPython mailing list, andin stackexchange.

    If the data to be serialized is located in a file and contains flat data, Python offers two methods to serialize data.

    repr

    The repr method in Python takes a single object parameter and returns a printable representation of the input:

    1. # input as flat text
    2. a = { "Type" : "A", "field1": "value1", "field2": "value2", "field3": "value3" }
    3.  
    4. # the same input can also be read from a file
    5. a = open('/tmp/file.py', 'r')
    6. # returns a printable representation of the input;
    7. # the output can be written to a file as well
    8. print(repr(a))
    9.  
    10. # write content to files using repr
    11. with open('/tmp/file.py') as f:f.write(repr(a))

    ast.literal_eval

    The literal_eval method safely parses and evaluates an expression for a Python datatype.Supported data types are: strings, numbers, tuples, lists, dicts, booleans, and None.

    1. with open('/tmp/file.py', 'r') as f: inp = ast.literal_eval(f.read())

    CSV file (flat data)

    Simple example for reading:

    Simple example for writing:

    1. # Writing CSV content to a file
    2. import csv
    3. with open('/temp/file.csv', 'w', newline='') as f:
    4. writer = csv.writer(f)
    5. writer.writerows(iterable)

    The module’s contents, functions, and examples can be foundin the Python documentation.

    There are many third party modules to parse and read/write YAML filestructures in Python. One such example is below.

    1. # Reading YAML content from a file using the load method
    2. import yaml
    3. with open('/tmp/file.yaml', 'r', newline='') as f:
    4. try:
    5. print(yaml.load(f))
    6. except yaml.YAMLError as ymlexcp:
    7. print(ymlexcp)

    Documentation on the third party module can be found.

    JSON file (nested data)

    Python’s JSON module can be used to read and write JSON files.Example code is below.

    Reading:

    1. # Reading JSON content from a file
    2. import json
    3. data = json.load(f)

    Writing:

    Example:

    1. # reading XML content from a file
    2. import xml.etree.ElementTree as ET
    3. tree = ET.parse('country_data.xml')
    4. root = tree.getroot()

    More documentation on using the xml.dom and xml.sax packages can be found.

    NumPy Array (flat data)

    Python’s NumPy array can be used to serialize and deserialize data to and from byte representation.

    Example:

    1. import NumPy as np
    2.  
    3. # Converting NumPy array to byte format
    4. byte_output = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ]).tobytes()
    5.  
    6. # Converting byte format back to NumPy array
    7. array_format = np.frombuffer(byte_output)

    The native data serialization module for Python is called .

    Here’s an example:

    1. import pickle
    2.  
    3. #Here's an example dict
    4. grades = { 'Alice': 89, 'Bob': 72, 'Charles': 87 }
    5.  
    6. #Use dumps to convert the object to a serialized string
    7. serial_grades = pickle.dumps( grades )
    8.  
    9. #Use loads to de-serialize an object

    If you’re looking for a serialization module that has support in multiplelanguages, Google’s Protobuf library is an option.