Disk storage has two advantages over memory storage: it persists between program runs, and it has greater capacity. On the other hand, accessing data on disk can take a million times as long as accessing data in memory. When the disk is accessed, it is not much more expensive to read many elements than to read one.
Java provides many classes for reading from and writing to disk. These include Scanner and PrintWriter for text and ObjectInputStream and ObjectOutputStream for binary data. Most of these classes are in the java.io package. We can write an instance of any class to disk if the class implements the Serializable interface.
Strings (and hence files) can be compressed by defining short codes for frequently occurring text. Huffman encoding takes advantage of the frequencies of individual characters. LempelZiv encoding creates codes for longer substrings.
Data too big to fit in memory can be sorted externally using an algorithm based on merge sort.
The B-tree data structure is useful for maintaining a set of data too large to fit in memory. Each node in a B-tree can contain many elements. The associated algorithms ensure that the tree stays balanced and very shallow, minimizing the number of disk accesses needed to find, insert, or delete an element.