Section 4.4. Automating Routine Tasks

4.4. Automating Routine Tasks

Computers are intended to do things for us. Yet, we spend much of our computing lifetimes repeatedly performing the same tasks, many of which are, in the end, rather mindless. System administration can sometimes become a chore chock-full of tedium and routine: the same keystrokes, the same checks, the same results. Wouldn't it be ideal if we were able to put the computer to its intended use, and have it do these things for us?

Linux provides an abundance of ways to automate tasks. Shell scripts can help, but the real power lies in two particular facilities: cron and, to a lesser extent, at. These are tools you'll rely upon to perform such routine tasks as archiving logs, updating the system database with newly installed applications, writing server statistics, and much, much more.

4.4.1. cron

cron is a Linux system daemon that executes scheduled scripts . The cron daemon, or crond, runs as a service that starts when your system starts. Every minute, it checks its own schedule database for tasks that need to be performed.

The heart of cron is the /etc/crontab file, which is shown below from the default, unaltered Fedora Core installation:

/etc/crontab

 SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/ # run-parts 01 * * * * root run-parts /etc/cron.hourly 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly

The crontab file first defines some environment variables : the shell in which the tasks will run, the PATH environment variable, the system account to which mail notifications will be sent, and the home directory to use. After the run-parts comment, the command schedule is listed.

4.4.2.1. The crontab Command Schedule Syntax

Believe it or not, a single line in this file provides all the information required for the system to perform a full set of tasks. In order to understand it, we need to break this line out into fields.

Note: cron always deals in 24-hour time: 7:24 means 7:24 a.m., and 19:24 means 7:24 p.m.

The first field defines how many minutes of each hour must pass before the task will be performed. In the above default crontab file, the first task is scheduled to start one minute past the hour; the last task is scheduled to commence at 42 minutes past the hour.
The second field defines the hour in which the task will be completed. From this, we can tell that the second task is scheduled to run at 4:02.
An asterisk indicates that the task should be executed every hour. Therefore, we can tell that the first task is scheduled to be run at 0:01, 1:01, 2:01, and so on, all the way to 23:01, after which it starts all over again.
The third field defines the day of the monthnumbered from 1 to 31on which the task will run. The last command is scheduled to run at 4:42 on the first day of the month. Again, an asterisk indicates that the task should be run every day of the month.
The fourth field determines the month in which the task should run; as you'd expect, the months are numbered from 1 to 12. If you had a task to perform only once a year, say in July, this field would contain a 7. If you had a task to perform every day in December, this field would contain twelve. Since, in our example, this field always contains an asterisk, we know that all of our tasks will be performed every month.
The fifth field defines the day of the week on which the task will run. 0 represents Sunday, 1 represents Monday, and 6 represents Saturday. Our third task has a 0 in this field: it will be executed at 4:22 on Sundays. Once more, an asterisk in this field indicates that the task will be performed on every day of the week.
The next field identifies the user who will complete the task. In all cases here, it's root.
The final field defines the actual task that is to be run. Here, we use run-parts to execute every script inside the /etc/cron.schedule directories. Scripts inside the /etc/cron.hourly directory are run every hour, scripts inside /etc/cron.daily are run daily, and so on.

4.4.2.2. Adding to crontab

Let's put this to more practical use with the scenario mentioned back in Chapter 3. You may remember that we introduced the power of the command line with the command find /var/backup/* -ctime +5 -exec rm {} \;, which I use to remove backups that are more than five days old three times a week. This automated process is powered by cron, so let's take a look at how this is configured on my system:

/etc/crontab

 SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/ # run-parts 01 * * * * root run-parts /etc/cron.hourly 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly 12 5 * * 2,4,6 root find /var/backup/* -ctime +5 -exec rm {} \;

The first two fields tell us that this task will be run at 5:12for purity's sake, I've scheduled the task to run at a time when I'm not using the system, though it probably won't make much of a difference in terms of performance. The next two fields are filled with asterisks, meaning they'll be run regardless of the date. The potentially interesting value here, though, is the fifth "day of week" field, which has the value 2,4,6this means that the task is scheduled for Tuesday, Thursday, and Saturday. crontab also allows for ranges. 1-5 in the fifth would schedule a task to run Monday to Friday.

cron can just as easily execute shell scripts . Let's create a script named backup.sh to actually create these backups. Save this file in the /home/username/bin directoryyou may need to create this directory if you haven't done so already.

~/bin/backup.sh

 #!/bin/sh # Create a directory for this backup. # Its name depends on the date and time so backups can't overwrite # each other, unless they're executed in the same minute. DATE=`/bin/date +%Y%m%d%H%M` mkdir /var/backup/$DATE # Backup the Apache logs. mkdir /var/backup/$DATE/apache-logs for f in /var/log/httpd/*; do   cp -fr "$f" --target-directory /var/backup/$DATE/apache-logs done # Backup kermit's home directory mkdir /var/backup/$DATE/kermit-home for f in /home/kermit/*; do   cp -fr "$f" --target-directory /var/backup/$DATE/kermit-home done

This above script achieves several tasks:

DATE=`/bin/date +%Y%m%d%H%M` declares a variable named DATE and fills it with the output of the date command. The +%Y%m%d%H%M option on the date command instructs it to format the date in YYYYmmddHHMM format. That is, the minute before midnight on December 31, 2006 becomes 200612312359. The backticks , or backward quotes around the date command, tell the system that this is a command embedded in your script that should be run to obtain the required text value.
Next, mkdir /var/backup/$DATE creates a directory for our backups based on this date.
mkdir /var/backup/$DATE/apache-logs creates a subdirectory for Apache logs.
The for f in /var/log/httpd/*; line sets up a loop to go through each file in the /var/log/httpd directory. Inside this loop, the variable f will refer to the current file.
cp -fr "$f" --target-directory /var/backup/$DATE/apache-logs copies the current file (f) to the target directory, /var/backup/current-date/apache-logs.
The previous three steps are then repeated, copying everything in /home/kermit to /var/backup/current-date/kermit-home.

Before we go ahead and run this script, we need to create the /var/backup directory, grant everyone write access to this directory, and make the backup.sh script executable:

[kermit@swinetrek ~]$ su Password: [root@swinetrek kermit]# mkdir /var/backup [root@swinetrek kermit]# chmod a+w /var/backup [root@swinetrek kermit]# exit exit [kermit@swinetrek ~]$ chmod u+x ~/bin/backup.sh [kermit@swinetrek ~]$

Let's test the script before we modify the crontab file to ensure that it works as we think it should:

[kermit@swinetrek ~]$ backup.sh cp: cannot stat `/var/log/httpd/*': Permission denied [kermit@swinetrek ~]$

We get this error message because we don't have access to the /var/log/httpd directory; we need to run this script as root:

[kermit@swinetrek ~]$ su Password: [root@swinetrek kermit]# backup.sh [root@swinetrek kermit]# exit exit [kermit@swinetrek ~]$ ls /var/backup 200612312359 [kermit@swinetrek ~]$

In the file listing, we can see that a directory has been created with the current date and time. Delve deeper into this directory until you're satisfied that the script is working as expected.

The last step in the process is to add the execution of the script to the crontab file on the schedule we've already defined:

/etc/crontab

 SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/ # run-parts 01 * * * * root run-parts /etc/cron.hourly 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly 12 5 * * 2,4,6  find /var/backup/* -ctime +5 -exec rm {} \; 32 5 * * * root /home/kermit/bin/backup.sh

This entry differs from the original only in that it executes a script rather than executing a lone command. Because the script can contain commands and logic, it's a sound approach to solving more complex routine operations.

4.4.2.3. Using the /etc/cron.`schedule` Directories

We've already discussed what the default crontab entries do. The first line runs the command run-parts /etc/cron.hourly every hour: run-parts will go into the /etc/cron.hourly directory and execute every executable file it finds there. Entries also exist for cron.daily, cron.weekly, and cron.monthly directories. If you prefer, you can simply store your backup.sh script in the /etc/cron.daily directory. It will run with the other daily scripts.

4.4.2. Anacron

Anacron is, to some extent, an extension of cron. Like cron, it's intended to execute commands on a schedule, taking care of routine tasks. However, unlike cron, Anacron makes no assumption that the machine is up and running 24/7. In that sense, Anacron provides some measure of redundancy to the functions of cron.

Like other Linux applications, Anacron gets its direction from a text configuration file. In Fedora, this file is /etc/anacrontab. At first glance, the anacrontab file looks less daunting than crontab, even though we know how simple the crontab file can actually be.

/etc/anacrontab

 # /etc/anacrontab: configuration file for anacron # See anacron(8) and anacrontab(5) for details. SHELL=/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin 1        65      cron.daily            run-parts /etc/cron.daily 7        70      cron.weekly           run-parts /etc/cron.weekly 30       75      cron.monthly          run-parts /etc/cron.monthly

As in crontab, the first few lines of this file define some environment variables : in this case, SHELL and PATH. The remaining lines describe, over a number of fields, the jobs that Anacron must carry out. From left to right, these fields are as follows:

period: The period field identifies the number of days that constitute the "period" in which the job will run once. In this example, the first job will be run every day, the second job will run once every seven days, and the last job will run once every thirty days.
delay: When it's time for a task to be executed, Anacron will wait this many minutes before executing the task. This is potentially useful when a machine is turned on after a long period of being switched off. For example, imagine if our machine was turned off for more than thirty days. When we turn it back on, Anacron will realize that it's time to execute all of it's scheduled tasks. It will execute the daily tasks in 65 minutes time, the weekly tasks in 70 minutes time and the monthly tasks in 75 minutes time. This staggering stops the machine becoming bogged down by three competing jobs.
job identifier: The job identifier field is as it sounds: a means by which the system can identify each Anacron task. This field can contain any character, barring spaces, tabs and slashes.
command: Finally, similar to crontab, anacrontab specifies the command that will be executed. Again, Anacron will execute run-parts to work its way through the scripts in /etc/cron.daily.

The operation of Anacron is pretty straightforward. When run, it reads the list of jobs from /etc/anacrontab and checks whether or not each job has been run in the last specified number of days. If not, Anacron runs the job after waiting for the delay period. If the job has been run in the specified time period, it leaves it alone. It's pretty simple.

We can start the Anacron daemon using one of the service tools we looked at earlier in this chapter. By default, it should be set up to start with your machine, but you can change this setting if you like.

Note: By default, both cron and Anacron are configured to run all of the scripts in the /etc/cron.schedule directories. However, only one of cron or Anacron will actually run the scripts.Each of these directories contains a script to keep Anacron up to date. For example, whenever cron runs the scripts in /etc/cron.daily, one of those scripts updates the file that Anacron uses to record when the task was last run. Later, when Anacron goes to run these scripts, it will see that cron has already run them, so it won't run them again.

4.4.3. at

Now, we've got cron to perform regularly scheduled tasks on your system. We've got Anacron to pick up cron's slack if the machine isn't up and running 24/7. That seems like a pretty full complement of task scheduling methods, doesn't it? As true as that may be, we've still left one piece out of the automated task puzzle: at.

at is a classic Linux hack, intended to take up where other applications leave off. In the cases of both cron and anacron, it's not a trivial task to add a simple, one-off task to the schedule. Let's say, for example, that you need to download a very large file , and you want to do it at a time when there's no-one else on the network, so plenty of bandwidth is available. You could add an entry to /etc/crontab, but you'd have to remember to remove it in order to avoid downloading the same file again in the future. at serves this niche purpose perfectly. Better yet, the syntax for using at couldn't be simpler:

[kermit@swinetrek ~]$ at 3:00 at> wget http://sitepoint.com/verylargefile.zip at> <EOT> job 1 at 2005-12-31 03:00 [kermit@swinetrek ~]$

Note: <EOT> stands for end of transmission, and is triggered by hitting CtrlD. Use this to indicate that you have finished entering commands to be executed at the given time.

at schedules the task for the next instance of the time you specify. In the above example, at would execute the given command at 3:00 a.m.

Note: Remember that at, like cron and Anacron, uses 24-hour time.

In summary, if you're looking to schedule tasks, Linux has you well covered. You have cron and Anacron for repeating tasks, and at for those one-off, occasional jobs.

4.4. Automating Routine Tasks

4.4.1. cron

/etc/crontab

4.4.2.1. The crontab Command Schedule Syntax

4.4.2.2. Adding to crontab

/etc/crontab

~/bin/backup.sh

/etc/crontab

4.4.2.3. Using the /etc/cron.schedule Directories

4.4.2. Anacron

/etc/anacrontab

4.4.3. at

4.4.2.3. Using the /etc/cron.`schedule` Directories