Table of Contents Previous Chapter The Andrew II Project: Backups & Archiving

The Andrew II Project:
Backups & Archiving

David VanRyzin
Computing Services

SECTION 1 Introduction

1.1 Purpose

This Requirements Specification defines the criteria necessary to implement a backup and archiving system for the Carnegie Mellon central computing environment. It describes all of the features required in the new system and the factors essential in its evaluation and implementation. This Backup & Archiving System will also be available for Carnegie Mellon departmental and organizational
computing facilities to implement independently.

This document will be used by the Backup and Archiving Project Group when evaluating possible backup and archiving systems. Vendors may use these Specifications to determine if their existing or future products are appropriate for the Carnegie Mellon system.

Carnegie Mellon is willing to work with vendors or other institutions in a joint development effort for a backup system that would handle Carnegie Mellon's needs as well as the needs of commercial users.

1.2 Scope

The central computing backup and archiving system will provide computer data storage and retrieval services for the Carnegie Mellon University community. This community includes 8,000 people using 2000 Macintoshs, 500 IBM PC's, and 1000 UNIX workstations connected by a Local Area Network and AFS file servers.

This system will be implemented in two stages. The first stage will extend
commercially-provided backup and archiving services to individual workstations and file servers on the campus network. These computers include Apple Macintoshes and AppleShare file servers, IBM PC compatibles and Novell
NetWare file servers, and UNIX workstations. These services will be provided by one or more vendor-supplied products which meet the requirements specified in this document.

The second stage will implement a backup and archiving system for DFS file
servers. This system will most likely be a joint venture between Carnegie Mellon and a outside vendor, providing Carnegie Mellon with the backup services it requires and the vendor with a commercially-viable backup product for AFS/DFS systems.

Backup and Archiving services will operate from our central computing and storage facilities, giving users backup and archiving service more efficiently and conveniently than they could provide for themselves. Figure 1 illustrates the relationship between the users and the central facilities.

Figure 1 Relationship of users to central computing facilities

1.3 Definitions, Acronyms, and Abbreviations

AFS
The Andrew File System. A UNIX-based, large scale distributed file system provided by Transarc Corporation. AFS uses a common file space shared by all AFS users, so there is always a unique file name associ- ated with any given file. File sharing is independent of a file's location --- nothing about a file's physical location or a user's location factors into how a file is accessed. AFS also provides Kerberos authentication and access control lists. AFS will be replaced by DFS in 1994. See DFS below.
Agent
The part of a backup system that runs on the client workstation
Archive
The act of storing a piece of data from a centralized server or from a remote system onto a storage medium. Archiving is used to improve the cost- efficiency of storing data.
Backup
The act of copying data from a centralized server or remote system onto a storage medium. Backups are used to recreate any element of the data in the event of data becoming corrupted or missing from the original source.
Backup Server
The machine to which data is dumped between the client machine and the backup media devices. DFS Distributed File System. A distributed file system as part
of the Open Software Foundation's (OSF) Distributed
Computing Environment (DCE). DFS is the next genera-
tion of AFS. DFS joins the file systems of all DCE
machines, so it appears to the user that there is a single
filespace. DFS features includes Kerberos authentication,
access control lists, data caching onto client machines,
network service replication, and location-independent
file sharing.
Kerberos
A network authentication protocol developed by MIT.
Migrate
The act of moving data from one storage medium to another. Used to free the space on faster, more valuable media and store the data on slower, less expensive media. Migration may happen in several stages, such as high-speed magnetic disk to slower magneto-optical and then to magnetic tape.
Network
Campus Ethernet, Token Ring, and AppleTalk connected with bridges as well as dialup asynchronous and SLIP connections.
Platform
A hardware/operating system combination.
Restore
The act of copying data from a storage medium onto the original source.
System
The collection of hardware, software, and networking components comprising the backup and archiving system as a whole.
Workstation
Any Macintosh, PC, or UNIX system that is connected to the network and has a local disk.

1.4 Overview

The Software Requirements are divided into two sections --- General Description and Specific Requirements.

The General Description provides the information necessary to understand the specific requirements for the complete system. The General Description is divided into five main sections.

The Specific Requirements section describes all of the features necessary in the complete backup and archiving system. Each feature described in this section is prioritized as mandatory, highly desirable, or desirable.

1.5 Contacts

If you have any questions about the requirements, or if you are interested in providing or developing part or all of the Carnegie Mellon Backup and Archiving System, please contact:

Mark Held
markh+@cmu.edu
(412) 268-5158

Wallace Colyer
wally+@cmu.edu
(412) 268-6497

Alex Margita
am3f+@andrew.cmu.edu
(412) 268-6688

Fax to
(412) 268-4987

SECTION 2 General Description

2.1 Project Perspective

2.1.1 The Existing Central Computing System
Carnegie Mellon Computing Services currently provides the campus community with a network-based service called Andrew.

Andrew service includes:

Andrew is used by over 8,000 students, faculty, and staff. Approximately 1000 UNIX workstations, 500 PC's, and 2000 Macintoshes are hooked up to the network though Ethernet, Token Ring, or AppleTalk connections and use Andrew Services. There are also AppleShare and Novell NetWare File Servers accessible through the network.

There are additional computing facilities for individual departments or research organizations. These facilities will not be included in the new backup and archiving service.

Computers that access the campus network through dial-up connections are not to be included in the new backup and archiving service.

·\>\> The Existing AFS Backup System
Currently, the only data that is backed up centrally is the data stored on the AFS file servers. In order to back up this data, three DECstation 2100s and a DECstation 3100, all with Storage Expansion Units, are used as backup servers. Each machine acts as a storage buffer between the Andrew fileservers and the 8mm tape drives. After the Andrew volumes are copied onto the backup servers, the servers alert the computer operators to mount the tapes needed for the backup. Operators mount tapes on the backup servers throughout the day. Files are backed up daily, weekly, and at the end of the fall, spring, and summer semesters.

Backups are copied onto 8mm tapes. The tape library contains 9-track tapes from our previous backup system that must be accessed for file restores.

If users want a file restored to them, they must contact Computer Operations and provide detailed information on the file to be restored. An operator then searches for the file in a database, mounts the appropriate tape, and restores the data to the user.

The existing system has the following problems:

·\>\> Existing Backup Systems for Workstations
At this time, Computer Services does not provide central backup services for the local disks of workstations hooked up to the network. Users must provide their own backups in one of several ways.

These options have the following problems:

2.1.2 The New Central Computing System
Computing Services is planning a new centralized network service offering
distributed computing for UNIX, Macintosh, and IBM PC workstations.

This system will include:

2.2 Backup and Archiving Project Functions

The project's short-term goal is to provide affordable, worry-free backup and archiving services to Macintosh, PC, and UNIX workstations through one or more commercial backup systems as soon as possible.

The long-term goal of the Backup and Archiving System is to provide reliable backups of the DFS Servers at Carnegie Mellon by January of 1994.

The central computing Backup and Archiving System must serve two main
functions:

·\>\> Backups
Backups are used to restore all or part of the central computing environment in case of a disaster or an accident.

·\>\> Archives
Archives are used to store data on the most appropriate medium in terms of cost and accessibility.

2.3 Organization of the New System

The central backup and archiving services can be provided by one integrated system or by several smaller subsystems. The Requirements presented in this document are for the backup and archiving system as a whole. We will consider any independent subsystem that provides a subset of these requirements.

2.4 User Characteristics

2.4.1 End Users
A user is any individual who has data backed up by the system. This includes every student, faculty, and staff member with an account on the AFS/DFS file servers or a personal computer hooked up to the network.

Users range from novice to expert. All incoming students take a Computing Skills Workshop to introduce them to the basics of using the campus Macintoshes and UNIX workstations. There are no computer training requirements for faculty and staff.

2.4.2 Computer Operators
Computer Operators provide twenty-four hour supervision and maintenance for central computing facilities. These duties include monitoring system performance, initiating backups, providing backup media, and restoring data to users.

Computer Operators undergo intensive training on the systems they care for.

2.4.3 Administrators
Administrators are in charge of changing parameters and system settings to the most appropriate level for the current environment. They also track the system's resources and monitor the system's performance.

Administrators have expert knowledge and a thorough understanding of the central computing environment, as well as the backup system.

2.4.4 Software Developers and Maintainers
Software Developers and Maintainers are system programmers who are responsible for maintaining the integrity of the backup system's software. They may also modify the behavior of the backup system or fix problems with the system.

Software Developers and Maintainers have expert knowledge and a thorough understanding of the central computing environment, as well as the backup system. These individuals are typically able to rebuild the backup system from source files.

2.5 General Constraints

·\>\> The Carnegie Mellon Environment
Computing Services cannot physically access all of the workstations on campus.

The Carnegie Mellon network is available to a wide variety of users, including administrators and students. Each person's or organization's data must be
protected from action or intrusion by naive or malicious users.

Many workstations on campus are publicly available to any student, faculty, or staff member. Other workstations are private, and can only be operated by specific users.

The entire environment includes AFS/DFS file servers, 1000 UNIX machines, 2000 Macintoshes, and 500 IBM PCs. Most of the UNIX, Macintosh, and PC workstations have access to central AFS/DFS, AppleShare, and Novell NetWare file servers. The amount of data created and stored by users is growing by 25% every year.

·\>\> Implications for the Backup and Archiving System
Because workstations are physically unaccessible, the system must be network-based.

Because some users of the network will be naive or malicious, the backup system must provide data protection --- that is, prevent unauthorized network access to the system's data or functions.

Because the computing environment is continually expanding, the backup system must be scalable --- that is, able to adjust easily to system growth.

Because the system must be network-based and provide data protection, an authentication process must verify the identity of entities making requests.

2.6 Assumptions and Dependencies

The backup system for AFS/DFS servers assumes that DFS will exist in the near future.

SECTION 3 Specific Requirements

Features described in the Specific Requirements fall into three different categories --- Mandatory, Highly Desirable, and Desirable.

These are requirements for the entire backup and archiving system. We will consider any independent subsystem that provides a subset of these requirements.

3.1 System Requirements

It is mandatory that the backup system:
It is highly desirable that the backup system:
It is desirable that the backup system:

3.2 Topology

It is mandatory that the backup system:
It is highly desirable that the backup system:

3.3 Capacity and Performance

It is mandatory that the backup system:
It is highly desirable that the backup system:

3.4 File Management

It is mandatory that the backup system:
It is highly desirable that the backup system:

3.5 Backup Features

It is mandatory that the backup system:
It is desirable that the backup system:

3.6 Recovery Features

It is mandatory that the backup system:
It is highly desirable that the backup system:
It is desirable that the backup system:

3.7 Media Management

It is mandatory that the backup system:
It is highly desirable that the backup system:

It is desirable that the backup system:

3.8 Monitoring

It is mandatory that the backup system:
It is highly desirable that the backup system:

3.9 Security

It is mandatory that the backup system:
It is highly desirable that the backup system:
It is desirable that the backup system:

3.10 Hardware

It is mandatory that the backup system:
It is highly desirable that the backup system:
It is desirable that the backup system:

3.11 Maintenance

It is mandatory that the backup system:
It is highly desirable that the backup system:

3.12 User Interfaces

It is mandatory that the backup system:
It is highly desirable that the backup system:
 
Table of Contents Next Chapter