[TOC] [Prev] [Next]

System Architecture


Topics:

Overview

The capability to store and retrieve Java objects is essential to building all but the most transient applications. The key to storing and retrieving objects is representing the state of objects in a serialized form sufficient to reconstruct the object(s). Objects to be saved in the stream may support either the Serializable or the Externalizable Interface. For Java objects, the serialized form must be able to identify and verify the Java class from which the object's contents were saved and to restore the contents to a new instance. For Serializable objects, the stream includes sufficient information to restore the fields in the stream to a compatible version of the class. For Externalizable objects, the class is solely responsible for the external format of its contents.

Objects to be stored and retrieved frequently refer to other objects. Those other objects must be stored and retrieved at the same time to maintain the relationships between the objects. When an object is stored, all of the objects that are reachable from that object are stored as well.

The goals for serializing Java objects are to:

Writing to an Object Stream

Writing objects and primitives to a stream is a straight-forward process. For example:

// Serialize today's date to a file. FileOutputStream f = new FileOutputStream("tmp"); ObjectOutputStream s = new ObjectOutputStream(f); s.writeObject("Today"); s.writeObject(new Date()); s.flush();
First an OutputStream, in this case a FileOutputStream, is needed to receive the bytes. Then an ObjectOutputStream is created that writes to the OutputStream. Next, the string "Today" and a Date object are written to the stream. More generally, objects are written with the writeObject method and primitives are written to the stream with the methods of DataOutput.

The writeObject method serializes the specified object and traverses its references to other objects in the object graph recursively to create a complete serialized representation of the graph. Within a stream, the first reference to any object results in the object being serialized or externalized and the assignment of a handle for that object. Subsequent references to that object are encoded as the handle. Using object handles preserves sharing and circular references that occur naturally in object graphs. Subsequent references to an object use only the handle allowing a very compact representation.

Special handling is required for objects of type Class, ObjectStreamClass, strings, and arrays. Other objects must implement either Serializable or Externalizable interfaces to be saved in or restored from a stream.

Primitive data types are written to the stream with the methods in the DataOutput interface, such as writeInt, writeFloat, or writeUTF. Individual bytes and arrays of bytes are written with the methods of OutputStream. All primitive data is written to the stream in block-data records prefixed by a marker and the length. Putting the data in records allows it to be skipped if necessary.

ObjectOutputStream can be extended to customize the information about classes in the stream or to replace objects to be serialized. Refer to the annotateClass and replaceObject method descriptions for details.

Reading from an Object Stream

Reading an object from a stream, like writing, is straight-forward:

// Deserialize a string and date from a file. FileInputStream in = new FileInputStream("tmp"); ObjectInputStream s = new ObjectInputStream(in); String today = (String)s.readObject(); Date date = (Date)s.readObject();
First an InputStream, in this case a FileInputStream, is needed as the source stream. Then an ObjectInputStream is created that reads from the InputStream. Next, the string "Today" and a Date object are read from the stream. Generally, objects are read with the readObject method and primitives are read from the stream with the methods of DataInput.

The readObject method deserializes the next object in the stream and traverses its references to other objects recursively to create the complete graph of objects serialized.

Primitive data types are read from the stream with the methods in the DataOutput interface, such as readInt, readFloat, or readUTF. Individual bytes and arrays of bytes are read with the methods of InputStream. All primitive data is read from block-data records.

ObjectInputStream can be extended to utilize customized information in the stream about classes or to replace objects that have been deserialized. Refer to the resolveClass and resolveObject method descriptions for details.

Object Streams as Containers

Object Serialization produces and consumes a stream of bytes that contain one or more primitives and objects. The objects written to the stream, in turn, refer to other objects which are also represented in the stream. Object Serialization produces just one stream format that encodes and stores the contained objects. Object Serialization has been designed to provide a rich set of features for Java classes. Other container formats such as OLE or OpenDoc have different stream or file system representations.

Each object acting as a container implements an interface that allows primitives and objects to be stored in or retrieved from it. These are the ObjectOutput and ObjectInput interfaces which:

Each object which is to be stored in a stream must explicitly allow itself to be stored and must implement the protocols needed to save and restore its state. Object Serialization defines two such protocols. The protocols allow the container to ask the object to write and read its state. To be stored in an Object Stream, each object must implement either the Serializable or the Externalizable interface.

For a Serializable class, Object Serialization can automatically save and restore fields of each class of an object and automatically handle classes that evolve by adding fields or supertypes. A Serializable class can declare which of its fields are transient (not saved or restored), and write and read optional values and objects.

For an Externalizable class, Object Serialization delegates to the class complete control over its external format and how the state of the supertype is saved and restored.

The ObjectOutput Interface

The ObjectOutput interface provides an abstract, stream-based interface to object storage. It extends DataOutput so those methods may be used for writing primitive data types. Objects implementing this interface can be used to store primitives and objects.

package java.io; public interface ObjectOutput extends DataOutput { public void writeObject(Object obj) throws IOException; public void write(int b) throws IOException; public void write(byte b[]) throws IOException; public void write(byte b[], int off, int len) throws IOException; public void flush() throws IOException; public void close() throws IOException; }
The writeObject method is used to write an object. The exceptions thrown reflect errors while accessing the object or its fields, or exceptions that occur in writing to storage. If any exception is thrown, the underlying storage may be corrupted, and you should refer to the object implementing this interface for details.

The ObjectInput Interface

The ObjectInput interface provides an abstract stream based interface to object retrieval. It extends DataInput so those methods for reading primitive data types are accessible in this interface.

package java.io; public interface ObjectInput extends DataInput { public Object readObject() throws ClassNotFoundException, IOException; public int read() throws IOException; public int read(byte b[]) throws IOException; public int read(byte b[], int off, int len) throws IOException; public long skip(long n) throws IOException; public int available() throws IOException; public void close() throws IOException; }
The readObject method is used to read and return an object. The exceptions thrown reflect errors while accessing the objects or its fields or exceptions that occur in reading from the storage. If any exception is thrown, the underlying storage may be corrupted, refer to the object implementing this interface for details.

The Serializable Interface

Object Serialization produces a stream with information about the Java classes for the objects that are being saved. For Serializable objects, sufficient information is kept to restore those objects even if a different (but compatible) version of the class's implementation is present. The interface Serializable is defined to identify classes that implement the Serializable protocol:

package java.io; public interface Serializable {};
A Serializable object:

ObjectOutputStream and ObjectInputStream are designed and implemented to allow the Serializable classes they operate on to evolve. Evolve in this context means to allow changes to the classes that are compatible with the earlier versions of the classes. Details of the mechanism to allow compatible changes can be found in Compatible Java Type Evolution.

The Externalizable Interface

For Externalizable objects only the identity of class of the object is saved by the container and it is the responsibility of the class to save and restore the contents. The interface Externalizable is defined as:

package java.io; public interface Externalizable extends Serializable { public void writeExternal(ObjectOutput out) throws IOException; public void readExternal(ObjectInput in) throws IOException, java.lang.ClassNotFoundException; }
An Externalizable Object:

The writeExternal and readExternal methods are public and raise the risk that a client may be able to write or read information in the object other than by using its methods and fields. These methods must be used only when the information held by the object is not sensitive or when exposing it would not present a security risk.

Protecting Sensitive Information

When developing a class that provides controlled access to resources, care must be taken to protect sensitive information and functions. During deserialization, the private state of the object is restored. For example, a file descriptor contains a handle that provides access to an operating system resource. Being able to forge a file descriptor would allow some forms of illegal access, since restoring state is done from a stream. Therefore, the serializing runtime must take the conservative approach and not trust the stream to contain only valid representations of objects. To avoid compromising a class, the sensitive state of an object must not be restored from the stream, or it must be reverified by the class. Several techniques are available to protect sensitive data in classes.

The easiest technique is to mark fields that contain sensitive data as private transient. Transient and static fields are not serialized or deserialized. Marking the field will prevent the state from appearing in the stream and from being restored during deserialization. Since writing and reading (of private fields) cannot be superseded outside of the class, the class's transient fields are safe.

Particularly sensitive classes should not be serialized at all. To accomplish this, the object should not implement either the Serializable or Externalizable interfaces.

Some classes may find it beneficial to allow writing and reading but specifically handle and revalidate the state as it is deserialized. The class should implement writeObject and readObject methods to save and restore only the appropriate state. If access should be denied, throwing a NotSerializableException will prevent further access.



[TOC] [Prev] [Next]

Copyright © 1996, 1997 Sun Microsystems, Inc. All rights reserved.