24.5. What Needs to Be Archived?
Beyond backing up data, organizations must also develop a strategy for archiving data. Both are integral
components
of an effective data protection strategy and necessitate a clear understanding of the business value of data (archiving maybe even more so than backup).
When executed correctly, archiving not only can save organizations money but it can also be a lifesaver,
especially
for those requiring access to historical information for regulatory compliance or audit purposes. Conversely, when archiving is performed incorrectly, it can cost a company dearly in terms of lost revenue,
fines
, and other penalties.
The problem is that many organizations mistakenly think of backup as archiving, and vice versa. The confusion regarding archiving often comes from some backup
vendors
that claim that their products also have archiving capabilities. Frequently, these capabilities equate to nothing more than backing up a data set and then deleting that dataset from primary storage. This is not archiving.
Vendor products offer different levels of "archiving" capabilities. At one end of the spectrum, some vendors treat archiving as simply a backup followed by a deletion of the data from primary storage. This type of "archiving" is really intended to assist organizations in removing old data that is cluttering up serversa problem that is better addressed with storage resource management (SRM) or hierarchical storage management (HSM) tools.
So, what is archiving and how does it fit into the data protection landscape? Archiving is the long-
term
storage of information for the retrieval of logical components for a specific business purpose. In comparison,
backups
are intended to protect against short-term data loss, such as accidental deletion, device failure, and data corruption.
Archival data candidates include periodic corporate financial information that needs to be retained for auditing purposes, medical patient information that must be retained for HIPAA compliance purposes, and clinical trial data for a new drug that is winding through the Food and Drug Administration drug approval process. Other examples include email, check images, and other types of electronic communication that could be
requested
in an audit.
The long-term nature of archived data
presents
a number of new problems:
-
Backward compatibility
-
Because tape and optical
drives
typically can't read media that is more than a generation or two old, organizations must give some thought to the long-term recovery of data that is archived to tape. Data can be
migrated
to new tape or optical platforms, but migration can present validation and authentication issues in some
regulated
industries.
-
Media longevity
-
If data is to be
maintained
for a long time on tape or optical media, steps must be taken to ensure media integrity. This includes maintaining proper environmental control and refreshing
volumes
as needed.
-
Readability/usability of the data after a restore
-
The archived data must be "portable." The archived data cannot depend on an obsolete version of an application or operating platform in order to be restored.
For both archiving and backup, it is critical to develop an understanding of the corporate value of the data to be protected. Deciding what data should be archived, when it should be archived, and how long it should be stored is central to the storage management process. A system of data classification like this can lead to
intelligent
policy management of both primary (disk) and secondary (backup and archive) storage resources.
|