Section 1.3. Basic Linux Concepts and Commands

1.3. Basic Linux Concepts and Commands

There are some basic Linux commands and concepts that you should know in order to be able to move around comfortably in a Linux filesystem. Check your knowledge of these commands, and if need be, brush up on them. At the end of the chapter, we list some good resources for learning more about these and other commands. Remember, these are commands that you type, not icons for clicking, though the windowing systems will let you set up icons to represent those commands, once you know what syntax to use.

So let's get started. Once you've logged in to your Linux system, regardless of which windowing system you are usingKDE, Gnome, Window Maker, and so on, start up an xterm window by running xterm (or even konsole) and you'll be ready to type these commands. ^[1]

^[1] If you're not using a windowing system, these commands are typed at the shell prompt that you get after you log in. But if you're not using a windowing system, either you're not a beginner (and don't need this introduction) or you can't get your windowing system to work, in which case you may need more help that we can give you here.

1.3.1. Redirecting I/O

The second great accomplishment of UNIX, ^[2] carried on into its Linux descendants, was the concept of redirecting input and output (I/O). It was based on the concept of a standardized way in which I/O would be done, called standard I/O.

^[2] Yes, we are aware that much of UNIX actually comes from the Multics project, but we credit UNIX with popularizing it.

1.3.1.1 Standard I/O

A familiar concept to Linux developers is the notion of standard I/O. Virtually every Linux process begins its life with three open file descriptorsstandard in, standard out, and standard error. Standard in is the source of input for the process; standard out is the destination of the process' output; and standard error is the destination for error messages. For "old fashioned" command-line applications, these correspond to keyboard input for standard in and the output window or screen for both standard out and error.

A feature of Linux that makes it so adaptable is its ability to redirect its I/O. Programs can be written generically to read from standard in and write to standard out, but then when the user runs the program, he or she can change (or redirect) the source (in) or destination (out) of the I/O. This allows a program to be used in different ways without changing its code.

Redirecting I/O is accomplished on the Linux shell command line by the "<" and ">" characters. Consider the ls program which lists the contents of a directory. Here is a sample run of ls:

 $ ls afile    more.data    zz.top $

We can redirect its output to another location, a file, with the ">" character:

 $ ls > my.files $

The output from the ls command no longer appears on the screen (the default location of standard out); it has been redirected to the file my.files.

What makes this so powerful a construct (albeit for a very simple example) is the fact that not only was no change to the program required, but the programmer who wrote the ls program also did nothing special for I/O. He simply built the program to write to standard out. The shell did the work of redirecting the output. This means that any program invoked by the shell can have its output similarly redirected.

Standard error is another location for output, but it was meant as the destination for error messages. For example, if you try to list the contents of a nonexistent directory, you get an error message:

 $ ls bogus ls: bogus: No such file or directory $

If you redirect standard out, nothing changes:

 $ ls bogus > save.out ls: bogus: No such file or directory $

That's because the programmer wrote the program to send the message to standard error, not standard out. In the shell (bash) we can redirect standard error by preceding the redirect symbol with the number 2, as follows: ^[3]

^[3] The use of the number 2 comes from an implementation detail: All the I/O descriptors for a UNIX process were kept in an array. The first three elements of the array, numbered 0, 1, and 2, were defined to be the standard in, out, and err, in that order. Thus in the shell you can also redirect standard out by using "1>" as well as the shorter ">".

 $ ls bogus 2> save.out $

Note there is no output visible from ls. The error message, ls: bogus: No such file or directory, has been written to the file save.out.

In a similar way standard input (stdin) can be redirected from its default source, the keyboard.

As an example, we'll run the sort program. Unless you tell it otherwise, sort will read from stdinthat is, the keyboard. We type a short list of phrases and then type a ^D (a Control-D) which won't really echo to the screen as we have shown but will tell Linux that it has reached the end of the input. The lines of text are then printed back out, now sorted by the first character of each line. (This is just the tip of the iceberg of what sort can do.)

 $ sort once upon a time a small creature came to live in the forest. ^D a small creature came to live in once upon a time the forest.

Now let's assume that we already have our text inside a file called story.txt. We can use that file as input to the sort program by redirecting the input with the "<" character. The sort doesn't know the difference. Our output is the same:

 $ sort < story.txt a small creature came to live in once upon a time the forest.

1.3.1.2 Pipes

The output from one command can also be sent directly to the input of another command. Such a connection is called a pipe. Linux command-line users also use "pipe" as a verb, describing a sequence of commands as piping the output of one command into another. Some examples:

 $ ls  | wc > wc.fields $ java MyCommand < data.file | grep -i total > out.put

The first example runs ls, then pipes its output to the input of the wc program. The output of the wc command is redirected to the file wc.fields. The second example runs java, giving it a class file named MyCommand. Any input that this command would normally read from keyboard input will be read this time from the file data.file. The output from this will be piped into grep, and the output from grep will be put into out.put.

Don't worry about what these commands really do. The point of the example is to show how they connect. This has wonderful implications for developers. You can write your program to read from the keyboard and write to a window, but then, without any change to the program, it can be instructed to read from files and write to files, or be interconnected with other programs.

This leads to a modularization of functions into small, reusable units. Each command can do a simple task, but it can be interconnected with other commands to do more, with each pipeline tailored by the user to do just what is needed. Take wc for example. Its job is to count words, lines, and characters in a file. Other commands don't have to provide an option to do this; any time you want to count the lines in your output, just pipe it into wc.

1.3.2. The ls Command

The ls command is so basic, showing the names of files in a directory. Be sure that you know how to use these options:

ls lists the files in a directory.
ls -l is the long form, showing permissions, ownership, and size.
ls -ld doesn't look inside the directory, so you can see the directory's permissions.
ls -lrt shows the most recently modified files last, so you can see what you've just changed.

1.3.3. Filenames

Filenames in Linux can be quite long and composed of virtually any character. Practically speaking, however, you're much better off if you limit the length to something reasonable, and keep to the alphanumeric characters, period, and the underscore ("_"). That's because almost all the other punctuation characters have a special meaning to the shell, so if you want to type them, you need to escape their special meaning, or suffer the results of unintended actions.

Filenames are case sensitiveupper- and lowercase names are different. The files ReadMe.txt and readme.txt could both be in the same directory; they are distinct files.

Avoid using spaces in filenames, as the shell uses whitespace to delineate between arguments on a command line. You can put a blank in a name, but then you always have to put the name in quotes to refer to it in the shell.

To give a filename more visual clues, use a period or an underscore. You can combine several in one filename, too. The filenames read_me_before_you_begin or test.data.for_my_program may be annoyingly long to type, but they are legal filenames.

Note

The period, or "dot," in Linux filenames has no special meaning. If you come from the MS-DOS world, you may think of the period as separating the filename from the extension, as in myprogrm.bas where the filename is limited to eight characters and the extension to three characters. Not so in Linux. There is no "extension," it's all just part of the filename.

You will still see names like delim.c or Account.java, but the .c or .java are simply the last two characters or the last five characters, respectively, of the filenames. That said, certain programs will insist on those endings for their files. The Java compiler will insist that its source files end in .java and will produce files that end in .classbut there is no special part of the filename to hold this. This will prove to be very handy, both when you name your files and when you use patterns to search for files (see below).

1.3.4. Permissions

Permissions in Linux are divided into three categories: the owner of a file (usually the user who created it), the group (a collection of users), and others, meaning everyone who is not the owner and not in the group. Any file belongs to a single owner and, simultaneously, to a single group. It has separate read/write/execute permissions for its owner, its group, and all others. If you are the owner of a file, but also a member of the group that owns the file, then the owner permissions are what counts. If you're not the owner, but a member of the group, then the group permissions will control your access to the file. All others get the "other" permissions.

If you think of the three permissions, read/write/execute, as three bits of a binary number, then a permission can be expressed as an octal digitwhere the most significant bit represents read permission, the middle bit is write permission, and the least significant bit is execute permission. If you think of the three categories, user/group/others, as three digits, then you can express the permissions of a file as three octal digits, for example "750". The earliest versions of this command required you to set file permissions this way, by specifying the octal number. Now, although there is a fancier syntax (for example, g+a), you can still use the octal numbers in the chmod command. See the example below.

The fancier, or more user-friendly, syntax uses letters to represent the various categories and permissions. The three categories of user, group, and other are represented by their first letters: u, g, and o. The permissions are similarly represented by r, w, and x. (OK, we know "x" is not the first letter, but it is a reasonable choice.) For both categories and permissions, the letter a stands for "all." Then, to add permissions, use the plus sign (+); to remove permissions, use the minus sign (-). So g+a means "add all permissions to the group category," and a+r means "add read permissions to all categories."

Be sure that you know these commands for manipulating permissions:

chmod changes the mode of a file, where mode refers to the read/write/execute permissions.
chown changes the owner of a file. ^[4]
^[4] On Linux the use of this command is restricted to the superuser, or "root."
chgrp changes the group owner of a file.

Table 1.1 shows some common uses of these commands.

Table 1.1. Changing permissions
Command	Explanation
`chmod a+r file`	Gives everyone read permission.
`chmod go-w file`	Takes away write permission from group, others.
`chmod u+x file`	Sets up a shell script so you can execute it like a command.
`chmod 600 file`	Sets permission to read and write for the owner but no permissions for anyone else.

1.3.5. File Copying

Do you know these commands?

The mv command (short for "move") lets you move a file from one place in the hierarchy of files to anotherthat is, from one directory to another. When you move the file, you can give it a new name. If you move it without putting it in a different directory, well, that's just renaming the file.

mv Classy.java Nouveau.java
mv Classy.java /tmp/outamy.way
mv Classx.java Classz.java ..
mv /usr/oldproject/*.java .

The first example moves Classy.java to a new name, Nouveau.java, while leaving the file in the same directory.

The second example moves the file named Classy.java from the current directory over to the /tmp directory and renames it outamy.wayunless the file outamy.way is an already existing directory. In that case, the file Classy.java will end up (still named Classy.java) inside the directory outamy.way.

The next example just moves the two Java source files up one level, to the parent directory. The ".." is a feature of every Linux directory. Whenever you create a directory, it gets created with two links already built in: ".." points to its parent (the directory that contains it), and "." points to the directory itself.

A common question at this point is, "Why does a directory need a reference to itself?" Whatever other reasons there may be, it certainly is a handy shorthand to refer to the current directory. If you need to move a whole lot of files from one directory to another, you can use the "." as your destination. That's the fourth example.

The cp command is much like the mv command, but the original file is left right where it is. In other words, it copies files instead of moving them. So:

 cp Classy.java Nouveau.java

will make a copy of Classy.java named Nouveau.java, and:

 cp Classy.java /tmp

will make a copy of Classy.java in the /tmp directory, and:

 cp *.java /tmp

will put the copies of all the Java sources in the current directory to the /tmp directory.

If you run this command,

 ln Classy.java /tmp

you might think that ln copies files, too. You will see Classy.java in your present working directory and you will see what appears to be a copy of the file in the /tmp directory. But if you edit your local copy of Classy.java and then look at the "copy" that you made in the /tmp directory, you will see the changes that you made to your local file now also appear in the file in the /tmp directory.

That's because ln doesn't make a copy. It makes a link. A link is just another name for the same contents. We will discuss linking in detail later in the book (see Section 6.2.1).

1.3.6. Seeing Stars

We need to describe shell pattern matching for those new to it. It's one of the more powerful things that the shell (the command processor) does for the userand it makes all the other commands seem that much more powerful.

When you type a command like we did previously:

 mv /usr/oldproject/*.java .

the asterisk character (called a "star" for short) is a shorthand to match any characters, which in combination with the .java will then match any file in the /usr/oldproject directory whose name ends with .java.

There are two significant things to remember about this feature. First, the star and the other shell pattern matching characters (described below) do not mean the same as the regular expressions in vi or other programs or languages. Shell pattern matching is similar in concept, but quite different in specifics.

Second, the pattern matching is done by the shell, the command interpreter, before the arguments are handed off to the specific command. Any text with these special characters is replaced, by the shell, with one or more filenames that match the pattern. This means that all the other Linux commands (mv, cp, ls, and so on) never see the special charactersthey don't do the pattern matching, the shell does. The shell just hands them a list of filenames.

The significance here is that this functionality is available to any and every command, including shell scripts and Java programs that you write, with no extra effort on your part. It also means that the syntax for specifying multiple files doesn't change between commandssince the commands don't implement that syntax; it's all taken care of in the shell before they ever see it. Any command that can handle multiple filenames on a command line can benefit from this shell feature.

If you're familiar with MS-DOS commands, consider the way pattern matching works (or doesn't work) there. The limited pattern matching you have available for a dir command in MS-DOS doesn't work with other commandsunless the programmer who wrote that command also implemented the same pattern matching feature.

What are the other special characters for pattern matching with filenames? Two other constructs worth knowing are the question mark and the square brackets. The "?" will match any single character.

The [...] construct is a bit more complicated. In its simplest form, it matches any of the characters inside; for example, [abc] matches any of a or b or c. So Version[123].java would match a file called Version2.java but not those called Version12.java or VersionC.java. The pattern Version*.java would match all of those. The pattern Version?.java would match all except Version12.java, since it has two characters where the ? matches only one.

The brackets can also match a range of characters, as in [a-z] or [0-9]. If the first character inside the brackets is a "^" or a "!", then (think "not") the meaning is reversed, and it will match anything but those characters. So Version[^0-9].java will match VersionC.java but not Version1.java. How would you match a "-", without it being taken to mean a range? Put it first inside the brackets. How would you match a "^" or "!" without it being understood as the "not"? Don't put it first.

Some sequences are so common that a shorthand syntax is included. Some other sequences are not sequential characters and are not easily expressed as a range, so a shorthand is included for those, too. The syntax for these special sequences is [: name:] where name is one of: alnum, alpha, ascii, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. The phrase [:alpha:] matches any alphabetic character. The phrase [:punct:] matches any punctuation character. We think you got the idea.

1.3.6.1 Escape at Last

Of course there are always times when you want the special character to be just that character, without its special meaning to the shell. In that case you need to escape the special meaning, either by preceding it with a backslash or by enclosing the expression in single quotes. The commands rm Account\$1.class or rm 'Account$1.class' would remove the file even though it has a dollar sign in its name (which would normally be interpreted by the shell as a variable). Any character sequence in single quotes is left alone by the shell; no special substitutions are done. Double quotes still do some substitutions inside them, such as shell variable substitution, so if you want literal values, use the single quotes.

Tip

As a general rule, if you are typing a filename which contains something other than alphanumeric characters, underscores, or periods, you probably want to enclose it in single quotes, to avoid any special shell meaning.

1.3.7. File Contents

Let's look at a directory of files. How do you know what's there? We can start with an ls to list the names:

 $ ls ReadMe.txt   Shift.java  dispColrs  moresrc Shift.class  anIcon.gif  jam.jar    moresrc.zip $

That lists them alphabetically, top to bottom, then left to right, arranged so as to make the most use of the space while keeping the list in columns. (There are options for other orderings, single column, and so on.)

An ls without options only tells us the names, and we can make some guesses based on those names (for example, which file is Java source, and which is a compiled class file). The long listing ls -l will tell us more: permissions, links, owner, group, size (in bytes), and the date of last modification.

 $ ls -l total 2414 -rw-r--r--   1 albing  users        132 Jan 22 07:53 ReadMe.txt -rw-r--r--   1 albing  users        637 Jan 22 07:52 Shift.class -rw-r--r--   1 albing  users        336 Jan 22 07:55 Shift.java -rw-r--r--   1 albing  users       1374 Jan 22 07:58 anIcon.gif -rw-r--r--   1 albing  users       8564 Jan 22 07:59 dispColrs -rw-r--r--   1 albing  users       1943 Jan 22 08:02 jam.jar drwxr-xr-x   2 albing  users         48 Jan 22 07:52 moresrc -rw-r--r--   1 albing  users    2435522 Jan 22 07:56 moresrc.zip $

While ls is only looking at the "outside" of files, ^[5] there is a command that looks at the "inside," the data itself, and based on that, tries to tell you what kind of file it found. The command is called file, and it takes as arguments a list of files, so you can give it the name of a single file or you can give it a whole long list of files.

^[5] Technically, ls (without arguments) need only read the directory, whereas ls -l looks at the contents of the inode in order to get all the other information (permissions, size, and so on), but it doesn't look at the data blocks of the file.

Note

Remember what was said about pattern matching in the shell: we can let the shell construct that list of files for us. We can give file the list of all the files in our current directory by using the "*" on the command line so that the shell does the work of expanding it to the names of all the files in our directory (since any filename will match the star pattern).

 $ file * ReadMe.txt:  ASCII text Shift.class: compiled Java class data, version 45.3 Shift.java:  ASCII Java program text anIcon.gif:  GIF image data, version 89a, 26 x 26, dispColrs:   PNG image data, 565 x 465, 8-bit/color RGB, non-interlaced jam.jar:     Zip archive data, at least v2.0 to extract moresrc:     directory moresrc.zip: Zip archive data, at least v1.0 to extract $

The file looks at the first several hundred bytes of the file and does a statistical analysis of the types of characters that it finds there, along with other special information it uses about the formats of certain files.

Three things to note with this output from file. First, notice that dispColrs was (correctly) identified as a PNG file, even without the .png suffix that it would normally have. That was done deliberately to show you that the type of file is based not just on the name but on the actual contents of the file.

Second, notice that the .jar file is identified as a ZIP archive. They really do use a identical internal format.

Thirdly, file is not foolproof. It's possible to have perfectly valid, compilable Java files that file thinks are C++ source, or even just English text. Still, it's a great first guess when you need to figure out what's in a directory.

Now let's look at a file. This simplest way to display its contents is to use cat.

 $ cat Shift.java import java.io.*; import java.net.*; /**  * The Shift object  */ public class Shift {   private int val;   public Shift() { }   // ... and so on } // class Shift

When a file is longer than a few lines you may want to use more or less to look at the file. ^[6] These programs provide a screen's worth of data, then pause for your input. You can press the space bar to get the next screen's worth of output. You can type a slash, then a string, and it will search forward for that string. If you have gone farther forward in the file than you wanted, press "b" to go backwards.

^[6] Like any open marketplace, the marketplace of ideas and open source software has its "metoo" products. Someone thought they could do even better than more, so they wrote a new, improved and largely upward compatible command. They named it less, on the minimalist philosophy (with apologies to Dave Barry: "I am not making this up") that "less is more." Nowadays, the more is rather passe. The less command has more features and has largely replaced it. In fact, on many Linux distributions, more is a link to less. In the name of full disclosure, there is also a paging program called pg, the precursor to more, but we'll say no more about that.

To find out more about the many, many commands available, press ? (the question mark) while it's running.

Typical uses for these commands are:

To view one or more files, for example more *.java, where you can type :n to skip to the next file.
To page through long output from a previous pipe of commands, for example, $ grep Account *.java | more, which will search (see more on grep below) for the string Account in all of the files whose names end in .java and print out each line that is foundand that output will be paginated by more.

If you need only to check the top few lines of a file, use head. You can choose how many lines from the front of the file to see with a simple parameter. The command head -7 will write out the first seven lines, then exit.

If your interest is the last few lines of a file, use tail. You can choose how many lines from the end of the file to see; the command tail -7 will write out the last seven lines of the file. But tail has another interesting parameter, -f. Though tail normally prints its lines and then, having reached the end of file, it quits, the -f option tells tail to wait after it prints the last few lines and then try again. ^[7] If some other program is writing to this file, then tail will, on its next read, find more data and print it out. It's a great way to watch a log file, for example, tail -f /tmp/server.log.

^[7] The less command has the same feature. If you press "F" while looking at a file, it goes into an identical mode to the tail -f command. As is often the case in the wacky world of Linux, there is more than one way to do it.

In this mode, tail won't end when it reaches the end of file, so when you want it to stop you'll have to manually interrupt it with a ^C (Control-Ci.e., hold down the Control key and press the C key).

1.3.8. The grep Command

No discussion of Linux commands would be complete without mentioning grep. Grep, an acronym for "generalized regular expression processor," is a tool for searching through the contents of a file. It searches not just for fixed sequences of characters, but can also handle regular expressions.

In its simplest form, grep myClass *.java will search for and display all lines from the specified files that contain the string myClass. (Recall that the *.java expansion is done by the shell, listing all the files that end with .java.)

The first parameter to grep, myClass in the example above, is the string that you want to search for. But the first nonoption parameter to grep is considered a regular expression meaning that it can contain special characters for pattern matching to make for more powerful searches (see Section 2.2.3). Some of the most common option parameters for grep are listed in Table 1.2.

Table 1.2. Options for grep
Option	Explanation
`-i`	Ignore upper/lower case differences in its matching.
`-l`	Only list the filename, not the actual line that matched.
`-n`	Show the line number where the match was found.
`-v`	Reverses the meaning of the searchshows every line that does not match the pattern.

Here's a quick example:

 grep println *.java | grep -v System.out

It will look for every occurrence of println but then exclude those that contain System.out. Be aware that while it will exclude lines like

 System.out.println(msg);

it will also exclude lines like this:

 file.println(msg);  // I'm not using System.out

It is, after all, just doing string searches.

1.3.9. The find Command

If someone compiled a list of the top 10 most useful Linux utilities, find would most likely be near the top of the list. But it would also make the top 10 most confusing. Its syntax is very unlike other Linux utilities. It consists of predicateslogical expressions that cause actions and have true/false values that determine if the rest of the expression is executed. Confused? If you haven't used find before you probably are. We'll try to shed a little light by showing a few examples.

 find . -name '*frag*' -print

This command looks for a file whose name contains frag. It starts looking in the current directory and descends into all subdirectories in its search.

 find /over/there . /tmp/here -name '*frag*.java' -print

This command looks for a file that has frag in its name and ends with .java. It searches for this file starting in three different directoriesthe current directory ("."), /over/there, and /tmp/here.

 find . -name 'My[A-Z]*.java' -exec ls -l '{}' \;

Starting in the current directory, this command searches for a file whose name begins with My followed by an uppercase alphabetic character followed by anything else, ending with .java. When it finds such a file, it will execute a commandin this case, the ls command with the -l option. The braces are replaced with the name of the file that is found; the "\;" indicates to find the end of the command.

The -name is called a predicate; it takes a regular expression as an argument. Any file that matches that regular expression pattern is considered true, so control passes on to the next predicatewhich in the first example is simply -print that prints the filename (to standard out) and is always true (but since no other predicate follows it in this example, it doesn't matter). Since only the names that match the regular expression cause the -name predicate to be true, only those names will get printed.

There are other predicates besides -name. You can get an entire list by typing man find at a command prompt, but Table 1.3 lists a few gems, to give you a taste of what find can do.

Table 1.3. Some find predicates
Option	Explanation
`-type d`	Is `true` if the file is a directory.
`-type f`	Is `TRue` if the file is a plain file (e.g., not a directory).
`-mtime -5`	Is `true` if the file is less than five days old, that is, has been modified within the last five days. A `+5` would mean older than five days and a `5` with no sign means exactly five days.
`-atime -5`	Is `TRue` if the file was accessed within the last five days. The `+` and `-` mean greater and less than the specified time, as in the previous example.
`-newer myEx.class`	Is `true` if the file is newer than the file `myEx.class`.
`-size +24k`	Is `true` if the file is greater than 24K. The suffix `c` would mean bytes or characters (since `b` stands for 512-byte blocks in this context). The `+` and `-` mean greater and less than the specified size, as in the other examples.

Let's look at an example to see how they fit together:

 $ find . -name '*.java' -mtime +90 -atime +30 -print ./MyExample.java ./old/sample/MyPrev.java $

This command printed out the names of two files that end with .java found beneath the current directory. These files hadn't been modified in the last 90 days nor accessed within the last 30 days. The next thing you might want to do is to run this command again adding something at the end to remove these old files.

 $ find . -name '*.java' -mtime +90 -atime +30 -print -exec rm '{}' \; ./MyExample.java ./old/sample/MyPrev.java $

1.3.10. The Shell Revisited

Most Linux shellsthe command interpreterscan be considered programming languages in their own right. That is, they have variables and control structuresif statements, for loops, and so on. While the syntax can be subtly different between shells, the basic constructs are all there.

Entire books can beand have beenwritten on shell programming. (It's one of our favorite subjects to teach.) Programs written in the shell language are often called shell scripts. Such scripts can be powerful yet easy to write (once you are familiar with the syntax) and can make you very productive in dealing with all those little housekeeping tasks that accompany program development. All you need to do (dangerous words, no?) is to put commands in a text file and give the file execute permissions. But that's a subject for another day.

Some elements of shell scripting, however, are useful even if you never create a single shell script. Of these, perhaps the most important to know (especially for Java programmers) is how to deal with shell variables.

Note

We'll be describing the syntax for bash, the default shell on most Linux distributions. The syntax will differ for other shells, but the concepts are largely the same.

Any string of alphanumeric or underscore characters can be used as the name of a variable. By convention shell variables typically use uppercase namesbut that is only convention (although it will hold true for most if not all of our examples, too). Since commands in Linux are almost always lowercase, the use of uppercase for shell variables helps them to stand out.

Set the value of a shell variable with the familiar methodthe equal sign:

 $ FILE=/tmp/abc.out $

This has assigned the variable FILE the value /tmp/abc.out. But to make use of the value that is now in FILE, the shell uses syntax that might not be familiar to you: The name must be preceded with a "$".

Shell variables can be passed on to other environments if they are exported, but they can never be passed back up. To set a shell variable for use by your current shell and every subsequent subshell, export the variable:

 $ export FILE $

You can combine the assignment of a value with the exporting into one step. Since repeating the export doesn't hurt, you will often see shell scripts use the export command every time they do an assignment, as if it were part of the assignment syntaxbut you know better.

 $ export FILE="/tmp/way.out" $

Note

The shell uses the dollar sign to distinguish between the variable name and just text of the same letters. Consider the following example:

 $ echo first > FILE $ echo second  > TEXT $ FILE=TEXT $ cat FILE first $

The cat command will dump the contents of the file named FILE to the screenand you should see first. But how would you tell the shell that you want to see the contents of the file whose name you have put in the shell variable FILE? For that you need the "$":

 $ cat $FILE second $

This is a contrived example, but the point is that shell syntax supports arbitrary strings of characters in the command linesome of them are filenames, others are just characters that you want to pass to a program. It needs a way to distinguish those from shell variables. It doesn't have that problem on the assignment because the "=" provides the needed clue. To say it in computer science terms, the "$" syntax provides the R-value of the variable. (Not the insulation R-value, but what you expect when a variable is used on the Right-hand-side of an assignment operator, as opposed to the L-value used on the Left-hand-side of an assignment operator.)

There are several shell variables that are already exported because they are used by the shell and other programs. You may need or want to set them to customize your environment. Since they are already exported, you won't need to use the export command and can just assign a value, but it doesn't hurt.

The most important shell variable to know is PATH. It defines the directories in the filesystem where the shell will look for programs to execute. When you type a command like ls or javac the shell will look in all of the directories specified in the PATH variable, in the order specified, until it finds the executable.

 $ echo $PATH /usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:. $

The PATH shown in the example has five directories, separated by colons (":"). (Note the fifth one, the "."; it says to look in the current directory.) Where do you suppose it will find cat? You can look for it yourself by searching in each directory specified in PATH. Or you can use the which command:

 $ which cat /bin/cat $

Some commands (like exit) don't show up, since they are built into the shell. Others may be aliasesbut that opens a whole other topic that we aren't covering here. Just remember that each directory in the PATH variable is examined for the executable you want to run. If you get a command not found error, the command may be there, it just may not be on your PATH.

To look at it the other way around: If you want to install a command so that you can execute it from the command line, you can either always type its full pathname, or (a more user-friendly choice) you can set your PATH variable to include the location of the new command's executable.

So where and how do you set PATH? Whenever a shell is started up, it reads some initialization files. These are shell scripts that are read and executed as if they were typed by the userthat is, not in a subshell. Among other actions, they often set values for variables like PATH. If you are using bash, look at .bashrc in your home directory.

Shell scripts are just shell commands stored in a file so that you don't need to type the same commands and options over and over. There are two ways to run a shell script. The easiest, often used when testing the script, is

 $ sh myscript

where myscript is the name of the file in which you have put your commands. (See Chapter 2 for more on how to do that.) Once you've got a script running the way you'd like, you might want to make its invocation as seamless as any other command. To do that, change its permissions to include the execution permission and then, if the file is located in a place that your PATH variable knows about, it will run as a command. Here's an example:

 $ chmod a+rx myscript $ mv myscript ${HOME}/bin $ myscript ... (script runs) $

The file was put into the bin directory off of the home directory. That's a common place to put homebrew commands. Just be sure that $HOME/bin is in your PATH, or edit .bashrc and add it.

If you want to parameterize your shell, you'll want to use the variables $1, $2, and so on which are given the first, second, and so on parameters on the command line that you used to invoke your script. If you type myscript Account.java then $1 will have the value Account.java for that invocation of the script.

We don't have the space to go into all that we'd like to about shell programming, but let us leave you with a simple example that can show you some of its power. Used in shell scripts, for loops can take a lot of drudgery out of file maintenance. Here's a simple but real example.

Imagine that your project has a naming convention that all Java files associated with the user interface on your project will begin with the letters "UI". Now suppose your boss decides to change that convention to "GUI" but you've already created 200 or more files using the old naming convention. Shell script to the rescue:

 for i in UI*.java do   new="G${i}"   echo $i ' ==> ' $new   mv $i $new done

You could just type those commands from the command linethat's the nature of shell syntax. But putting them into a file lets you test out the script without having to type it over and over, and keeps the correct syntax once you've got it debugged. Assuming we put those commands into a file called myscript, here's a sample run:

 $ myscript UI_Button.java ==> GUI_Button.java UI_Plovar.java ==> GUI_Plovar.java UI_Screen.java ==> GUI_Screen.java UI_Tofal.java ==> GUI_Tofal.java UI_Unsov.java ==> GUI_Unsov.java ... $

Imagine having to rename 200 files. Now imagine having to do that with a point-and-click interface. It could take you all morning. With our shell script, it will be done in seconds.

We can't hope to cover all that we'd like to about shell scripting. Perhaps we have been able to whet your appetite. There are lots of books on the subject of shell programming. We've listed a few at the end of this chapter.

1.3.11. The tar and zip Commands

The tar and zip commands allow you to pack data into an archive or extract it back. They provide lossless data compression (unlike some image compression algorithms) so that you get back out exactly what you put in, but it can take up less space when archived.^[8] Therefore tar and zip are often used for data backup, archival, and network transmission.

^[8] Well, technically, tar doesn't compress the data in the file, but it does provide a certain amount of "compression" by cutting off the tail ends of blocks of data; for example, a file of 37 bytes in its own file takes up 4K of disk space since disk blocks are allocated in "chunks" (not the technical term). When you tar together a whole bunch of files, those extra tail-end empty bytes are not used (except in the final block of the TAR file). So, for example, 10 files of 400 bytes could be packed into a single 4K file, instead of the 40K bytes they would occupy on the filesystem. So, while tar won't compress the data inside the file (and thus is quite assuredly "lossless") it does result in a smaller file.

There are three basic actions that you can take with tar, and you can specify which action you want with a single letter^[9] in the arguments on the command line. You can either

^[9] Linux option strings always start with a "-", right? Yes, except for tar. It seems there is always an exception to every rule. The newer versions of tar allow the leading minus sign, but can also work without it, for historical compatibility reasons. Early versions of UNIX only had single letter options. Newer POSIX versions of UNIX and the GNU tools, which means all flavors of Linux, also support longer full-word options prefixed with a double minus, as in --extract instead of x or -x.

c: Create an archive.
x: Extract from an archive.
t: Get a table of contents.

In addition, you'll want to know these options:

f: The next parameter is the filename of the archive.
v: Provide more verbose output.

Using these options, Table 1.4 shows examples of each of the basic functions.

Table 1.4. Examples of the tar command
Command	Explanation
`tar tvf packedup.tar`	Gives a table of contents, in long (or verbose) form. Without the `v`, all you get is the filenames; with the `v` you get additional information similar in format to the `ls -l` command.
`tar xvf packedup.tar`	Extracts all the files from the TAR file, creating them according to their specified pathname, assuming your user ID and file permissions allow it. Remove the `v` option if you don't want to see each filename as the file is extracted.
`tar cvf packedup.tar mydir`	Creates a TAR archive named `packedup.tar` from the `mydir` directory and its contents. Remove the `v` option if you don't want to see each filename as the file is added to the archive.

Now let's do the same thing using the zip command (Table 1.5). There are actually two commands hereone to compress the files into an archive (zip), and the other to reverse the process (unzip).

Table 1.5. Examples of the zip and unzip commands
Command	Explanation
`unzip -l packedup.zip`	Gives a table of contents of the archive with some extra frill around the edges, like a count of the files in the archive.
`unzip packedup.zip`	Extracts all the files from the ZIP file, creating them according to their specified pathname, assuming your user ID and file permissions allow it. Add the quiet option with `-q` if you would like unzip not to list each file as it unzips it.
`zip -r packedup mydir`	Creates a ZIP archive named `packedup.zip` from the `mydir` directory and its contents. The `-r` tells zip to recursively descend into all the subdirectories, their subdirectories, and so on; otherwise, zip will just take the files at the first layer and go no deeper.

Tip

Since TAR and ZIP files can contain absolute as well as relative pathnames, it is a good idea to look at their contents (e.g., tar tvf file) before unpacking them, so that you know what is going to be written where.

There are many, many more options for tar and zip that we are not covering here, but these are the most common in our experience, and they will give you a good start.

The tar and zip commands are also worth knowing about by a Java developer because of their relationship to JAR files. If you are working with Java you will soon run across the notion of a Java ARchive file, or JAR file. They are recognizable by name, ending in .jar. Certain Java tools are built to understand the internal format of JAR files. For Enterprise Java (J2EE) there are similar archives known as WAR files and EAR files. The command syntax for dealing with the jar command that builds these archives is very similar to the basic commands of tar. The internal format of a jar is the same as a ZIP file. In fact, most places where you can use a JAR file you can use a ZIP file as well. (You will see more about this when we discuss the standard Java tools in Section 5.11.)

Tip

Here's one more handy example we know you'll use:

 find . -name '*.java' -print | zip allmysource -@

This command starts in the current directory (".") finding every file that ends in .java and gives their names to zip which will read them from standard in instead of its argument list (told to do so with the -@ argument) and zip them all into an archive named allmysource.zip. To put it simply, it will zip up all your Java source files from the current directory on down.

1.3.12. The man Command

Primitive but handy, the man command (short for manual) was the early UNIX online manual. While we've come to expect (and ignore) online help, the idea of online manuals was rather revolutionary in the early days of UNIX. In contrast to walls of printed documentation, UNIX provided terse but definitive descriptions of its various commands. When they are done well, these descriptions are an invaluable handy reference. They are not the best way to learn about a command, but they can be a great guide to using the command's options correctly.

The format is simply man followed by the name of the command about which you want information. So man man will tell you about the man command itself.

The most useful option to man is the -k option. It will do a keyword search in the titles of all the manpages looking for the keyword that you give. Try typing man -k java to see what commands are available. The (1) means that it's a user commandsomething that you can type from the shell prompt, as opposed to (2) which is a system call or (3) which is a C library call. These numbers refer to the original UNIX documentation volumes (volume one was shell commands and so on), and it all fit into a single three ring binder.

Tip

One other way to find out something about a command, if you know the command name already, is to ask the command itself for help. Most commands have either a -? or --help option. Try --help first. If you need to type -? either put it in single quotes or type it with a backslash before the question mark, as in -\?, since the ? is a pattern-matching character to the shell.

There are other help systems available, such as info and some GUI-based ones. But man provides some of the quickest and most terse help when you need to check the syntax of a command or find out if there is an option that does what you need.

1.3. Basic Linux Concepts and Commands

1.3.1. Redirecting I/O

1.3.1.1 Standard I/O

1.3.1.2 Pipes

1.3.2. The ls Command

1.3.3. Filenames

1.3.4. Permissions

Table 1.1. Changing permissions

1.3.5. File Copying

1.3.6. Seeing Stars

1.3.6.1 Escape at Last

1.3.7. File Contents

1.3.8. The grep Command

Table 1.2. Options for grep

1.3.9. The find Command

Table 1.3. Some find predicates

1.3.10. The Shell Revisited

1.3.11. The tar and zip Commands

Table 1.4. Examples of the tar command

Table 1.5. Examples of the zip and unzip commands

1.3.12. The man Command