Now that your TSM servers are talking to your TSM clients, it’s time to create a daily schedule of TSM events to ensure that all of your business-critical data is ready for any disaster recovery scenario.
Now that your TSM servers are talking to your TSM clients, it’s time to put everything together. In the final part of the TSM cheat sheet series, we’ll work on creating a daily schedule of TSM events to ensure that all of your business-critical data is backed up, kept available, sent offsite, and ready for any major disaster recovery scenario.
The TSM Server Administrative Scheduler
Most daily tasks are handled by the TSM server administrative scheduler. This timing tool is integrated into the TSM server software and will process jobs as long as the dsmserv process is running on a system (it’s usually initiated in the /etc/inittab file on AIX during boot time). In many ways, the TSM server administrative scheduler is more robust than other common scheduling programs like cron because it can handle certain degrees of logic such as starting and stopping schedules at certain points of the day, prioritizing different schedules, and setting expiration dates when schedules are no longer needed.
Schedules can be established with the define schedule command and queried with the query schedule command. When creating a schedule, the command structure follows the same type of SQL-like input that TSM uses with words instead of flags (e.g., starttime, duration, cmd). Schedules that manage activities within the TSM server are designated with the type=admingrouping. The scheduler can even call command-line programs and executables that have been previously defined through the define script command. Previously run schedules can be examined with the query actlog command, and any live schedules or jobs that might be waiting for input can be examined with the query request and reply commands.
In a normal TSM environment, there are typically six main groups of tasks that take place on a daily basis. These tasks are primarily managed by the TSM server administrative scheduler and will run at different points in the day, depending on the type of data that needs to be backed up and manipulated. The following diagram shows a high-level overview of these six tasks.
TSM Client Backups. During this point in time, the TSM clients send all of their backup and archival information over to the TSM server. In most production environments, this is usually scheduled to occur after the main daily processing, during a lull in activity, or overnight. As covered in previous articles, this can be managed through a variety of means; my own historical preference has been to use cron jobs to cover them.
Inventory Expiration. After all of the backups and archives are complete, the next step is to expire any old, inactive files. This serves two main purposes—to minimize the quantity of data that will later be sent to tape and to discard any old data that the TSM database has tracked for offsite tapes, so that they can come back onsite and be reclaimed. The TSM server will typically expire inactive files a couple of times during the day, but it can also be done anytime with the expire inventory command.
Disk Pool Migrations. Once the data has made its way to the TSM server, it usually needs to be moved from disk to tape for long-term retention. This can be done through the TSM scheduler by entering a command to set the highmig and lowmig values for disk pools to 0. Afterward, another TSM schedule can raise those values back up to ensure that additional backups and archives go to disk first.
Tape Pool Migrations & Copies. Now that the data has made its way onto tape, another migration or copy of the data can be taken to be sent offsite—depending on how much information needs to be instantly accessible. The TSM scheduler can run a backup stgpool command to create a copy of the data on tape. This can be a very labor-intensive piece of work, possibly maximizing all tape drives assigned to a TSM server, so it’s best to do this when not much else is happening within the TSM library. After all the migrations and copies are done, the TSM server database should also be copied to tape with the backup db command to create a single volume that tracks the identity and purpose of all of the volumes.
Tape Checkouts & Checkins. After all of the data has been copied, it’s time to eject the tapes to be sent offsite and stored in case of a true disaster. The checkout libvol or move media command will instruct the TSM library to use its picker arm to grab any volumes specified and place them in the I/O door for removal. It will mark the volumes (as well as any tape pool copies and database backups) as being offsite and out of the library. Afterward, scratch tapes, reclaimed volumes, or tapes that need to be reinserted in the library can be loaded into the I/O door and inventoried with the checkin libvol command.
Volume Reclamations. Any volumes that are fractionally used can be reclaimed together to maximize tape capacity and the number of scratch tapes. For example, following an inventory expiration, three tapes that are within the same tape pool and 20 percent used apiece can be consolidated down to a single volume at 60 percent, thereby freeing up those three tapes as scratch volumes. This type of processing can overlap most TSM client backups to maximize a 24-hour schedule.
Wrapping It Up
Throughout this series, we’ve taken a look at the powerful and robust Tivoli Storage Manager product. It’s a truly versatile and wonderful tool that can be used in many different capacities, from short-term backup needs to long-term archival of sensitive and confidential information. It’s flexible enough that small operations such as community colleges can use it to retain student data, but it’s scalable upwards so that multinational corporations cna use it to protect sales and product data. And, it’s a great tool for AIX systems administrators to use in their arsenals of backup and recovery software.
I’ll end this series with a personal story. Several years ago, I designed a complete TSM solution for the human resources and financials systems running on AIX for a company that was worth billions of dollars. The business used multiple types of other systems, including Windows boxes and iSeries servers, which did things like process orders and serve up e-mail. Each system was covered by its own unique backup and recovery software.
During a disaster recovery simulation that took place over three days, each of these environments had to be recovered to make the test scenario work correctly. Teams worked diligently around the clock to get their boxes up and running, but of all of the systems that were business-critical, only my AIX servers were successfully recovered—thanks to TSM.
A few years later, that company’s building was hit by a tornado during a horrible storm, and the data center space was left inoperable for some time. From what I heard, although it was a challenge to recover the rest of the data, the AIX and TSM environments came through well.