Tuesday, June 19, 2007

Have faith in Linux

Linux does not have an inherent right to exist. Now that may sound like blasphemy to many Linux aficionados, but let me explain.

I've just read one too many posts in Linux support forums where a new user coming from a Windows background have a problem with something that worked or were easy to do under Windows, and seems impossible or hard to do under Linux. Often the person is taking a slightly accusatory tone, maybe because they have been fooled into thinking everything will be plain-sailing. The unfortunate all-to-often response that I see is that "Linux is not trying to compete with Windows".

A statement like that is a wimpish cop-out, and is wrong on so many levels it isn't even funny. People who read this response gets the message that Linux cannot compete with Windows. Before you write this again, go read one of the many "Why Linux is Better than Windows" pages, and think about that statement again.

Of course Linux is competing with Windows: A computer runs only one operating system at a time, either Windows or Linux or some other operating system (barring virtualization of course).

Importantly, without users using, and therefore computers running Linux, there is no Linux. Developers need motivation to write drivers and programs, and with no users, much of this motivation disappears, and Linux disappears into the annals of time as a futile though interesting exercise in creating an alternative to Windows.

So without Users, there is No Linux. Linux' right to exist resides in its user base, and every user who goes back to Windows because of insensitive answers is one less reason to develop Linux. You may consider each sensitive, well-thought-out, positive, encouraging answer to a frustrated users' plea for help to be an investment into the user base which will ensure that tomorrow you still HAVE a Linux platform.

Maybe we should take a page out of the fanatic "fanboy" Linux followers' book and become a bit more aggressive in our approach to promoting and backing Linux as a better-than-Windows platform.

People will always compare their experience of Linux with what they are used to getting from Windows. Unfortunately human nature have us focus on the negatives, and this means people overlook the great features of Linux and open source software because of a few small troubles.

I find there are two things that need to be said when new Linux users are frustrated with the platform.

  • One, remind the person that Linux development is behind on some areas due to restrictions, such as DRM and copyrights existing in the Windows software, which slows down or outright prevents development of software for Linux. At the same time remind the user that in many more areas Linux is far ahead of its Windows competition.
  • And second: encourage the person to continue looking for a solution - New programs and packages appear every day, so a newly hatched solution might exist but still be unknown to most people.

So go and arm yourself with some real information on why people should switch completely to Linux so that you don't come across as a fanboy, and stop wimping out when Linux does fall short of its competition.

Saturday, June 9, 2007

Tutorial: Redirection and Pipes

Today I realized I should not gloss over two seemingly simple items, the first being pipes and redirection, and the other being substitution. This article will explain the first of these for the benefit of those new to Linux and Unix how to link together simple commands to build powerful tools.

Before I start, a quick introduction to the Unix command-line syntax is appropriate.

  • The simplest Unix command is just a command word, which is the full path name of the file to be executed, starting with a slash representing the root. For example, if you enter /usr/bin/ls, the shell will run the ls program found in the /usr/bin directory.
  • Similarly if you enter a command without specifying the directory, the shell will search all the directories specified in the PATH variable, in the order they are listed, and will execute the first program it finds with the name specified.
  • Slightly more complex commands take options and/or arguments. These are extra "words" which modifies the behavior of the program. For example the ps program lists processes associated with your terminal session, but if you add the "-e" option, it will list all processes running on the system.
  • Simple options which turn on a different behavior in the command are called switches, and these can be stacked. For example the -f switch, which causes ps to include additional details about each listed process and the -e switch mentioned above, can be stacked, like this:
$ ps -ef
    

... to produce a listing of "every process" with "full defaults" A note on the formatting in this article: When a line is preceded with a "$", it means that it is a command to be entered at the shell prompt, though the $ itself must not be entered. The usual shell prompt is a $ for normal users, or a # for root, though your system may well show additional details in the prompt string. Lines in blue text and that does not start with a "$" is the output generated by the command. There is more to commands, but this is enough to be able to understand the rest of this tutorial, so without further delay on to the real subject: Redirection and Pipes. Redirection allows you to "store" the output from a command directly into a file on disk, or to read lines from a file on disk.

$ ps -ef >tmp/processlist.txt
    

If you run the command above you get ... nothing, and you get it fast.. That's right. The ps -ef command on its own produces the expected output, but the ">tmp/processlist.txt" portion causes this output to go into the file /tmp/processlist.txt in stead of appearing in your terminal. You can examine the newly created file with the ls command (Which "lists" information about files and/or directories)

$ ls -l /tmp/processlist.txt 
-rw-r--r-- 1 user1 user1 23100 Jun 4  10:06 /tmp/processlist.txt
    

And you can actually view its contents with any one of the "cat", "less" or "more" commands, like this:

$ cat /tmp/processlist.txt
    

Now that you have this file on the disk you can use it over and over. Enter the above cat command again, and it shows the file contents again. Much more interesting tough, is to filter out specific lines from the file. The head command prints the first 10 lines from the specified file.

$ head /tmp/processlist.txt 
$ tail -4 /tmp/processlist.txt
    

And you guessed it, tail prints the last 10 lines. Specifying a switch with head or tail, like the "-4" above lets you to control the number of lines being displayed if you want something other than the default 10. Much more interesting than "head" or "tail" is grep, which I used extensively in a previous tutorial.

The "grep" command filters lines out based on a special rule for searching, called a "regular expression". These can be quite complex, but does not always need to be, e.g:

$ grep root /tmp/processlist.txt
    

The above command reads the specified file and prints only the lines that contain the string "root". You can also get grep to do the inverse, like this:

$ grep -v root /tmp/processlist.txt
    

Note the "-v" switch which modified grep's behavior to print lines not matching the expression. In the above examples, "grep", "head", "tail" and "cat" were expressly told what file to open and search through. The /tmp/processlist.txt filename specified is "an argument" of each command. Combining grep searches with redirection allows us to create more stored files for further parsing.

$ grep root /tmp/processlist.txt >tmp/root_processes.txt 
$ grep -v root /tmp/processlist.txt >tmp/nonroot_processes.txt
     

This then creates two files, with complementary information - the first with all the lines containing the string root in the file /tmp/root_processes.txt, and the other file with all the lines that does not contain this string. In the redirection examples above the "greater than sign" (>) was used like an arrow pointing from the command ... to the file. I mentioned above that redirection can also read from a file, in which case you just need to switch the direction of the "arrow" For example

$ wc -l < /tmp/nonroot_processes.txt
    

Will read the file /tmp/nonroot_processes.txt and print the number of lines found. (The wc command is the so-called "word count command", and the -l switch is used to modify its behavior to count lines in stead. The "grep", "head", "tail" and "cat", etc commands could also have been used in this fashion. See for example:

$ cat
      

which is almost identical to

$ cat /tmp/root_processes.txt
      

In fact, the difference is of academic importance only, and has got to do with how the file is opened and who owns the file handle. The output is identical. However in the case of wc, the output is not the same. GASP! How can wc report something different depending on when the file was opened after I just said where the file is opened is of academic importance only?The answer is that wc reports the same information in that the number of lines will be correct in both cases, eg

$ wc -l
      

But in the first case wc does not print the file name, while in the second it does. This is because the redirected input is read via a special file handle called "standard input", whereas with files being opened explicitly by the program, the program is automatically aware of the file name and its location on disk. And with that little behavioral artifact, I have arrived at the concept centrally to all redirection and pipes (which I will get to in a minute). Unix and Linux commands write their output to "standard output", and reads input from "standard input". When you perform redirection, you literally redirect this output to go somewhere else or the reading to come from somewhere else. If you run two commands, both redirecting their output to the same file, in succession, it will cause the output from the first command to be lost. The redirection will overwrite the file. In order to append more lines to an existing file, use the double-redirection arrows (>>), like this:

$ cat /tmp/nonroot_processes.txt >/tmp/parts
$ cat /tmp/root_processes.txt >>/tmp/parts
      

This is especially useful in logging of events, i.e. when you do not want information about old events to be lost when new events are recorded. Next up: Piping. It is possible to take the output from one command and redirect it into another command running at the same time, without having to first store it in a file. This is achieved by means of the "|" pipe symbol. Example

$ ps -ef | more
      

The "more" command is made for situations where you have more lines being displayed on the screen than what can fit in the terminal, causing the lines to scroll past before you can read them. When a screen-full is reached, "more" pauses the scrolling of the output, and allows you to hit ENTER for one more line, SPACE for another screen-full, or "q" to "quit" immediately. (Note; less is the successor to more, and allows you to scroll backwards, up or down by half a screen-full, and most importantly to me, when you use it to search for words, it highlights them on the screen) The above method of "pausing" terminal output is used every day by Unix and Linux administrators world wide, especially with large files. Another example:

$ egrep -i "warn|err" /var/log/messages | less
      

The egrep command is an extended grep. In particular, it makes it easy to specify multiple search strings. The above command will show lines with either the string "warn" or "err", and the -i switch makes the search case insensitive, therefore you will also get ERR, Warn, etc.

The pipe to less is used in case there is a lot of output. Running the above command on your Linux machine every week or so will tell you a lot about its health as basically every veguely standard piece of software will log its messages to this file. It is possible to make longer pipe chains. For example

$ ps -ef | grep root | less
      

Run the ps command with switches -e and f; pipe its output into grep and filter out lines containing the string root; Pipe this output into less to make it manageable on the screen. A pipe really just redirects the output from one command into the input of another command without having to first store it on disk. You can usually only have one input and two output redirectors per command (There are some exceptions, but that is a somewhat advanced topic). That is right - two output redirectors. Earlier you saw that with redirection, the command had no output. But this command still produces output:

$ ls -l /etc/file799.xyz > /tmp/fileinfo.txt
/etc/file799.txt: No such file or directory
    

Basically the output is a message warning you that the ls command encountered an error situation. This error message is not printed on the standard output - In Unix and Linux, error messages are printed on a special, dedicated file, called "standard error". The below example uses all three redirections:

$ runreport </tmp/report_options >tmp/main_report.txt 2>tmp/report_errors.txt
      

Here the command "runreport" is some imaginary program. It will be reading the lines in the /tmp/report_options file, which supposedly controls how it reports. The report output will be stored in a file in /tmp/main_report.txt. If any errors are encountered, these will be recorded in the file /tmp/report_errors.txt You will see a 2>for redirecting standard output. The >redirection of standard output has in fact got an implied 1, for file handle number 1, which is "standard output". The three standard file descriptors are:

0 - Standard Input or STDIN
1 - Standard Output or STDOUT
2 - Standard Error or STDERR

Note that in the above examples, the file to which the standard output is being directed does not capture the error messages. It is possible to create a single file with both the normal output and any error messages interleaved, as it is often handy to see where and when a job or process produced error messages. To do this, there is a way to redirect the standard error into the standard output. (Read that again). Example

$ runreport >tmp/report_options >tmp/main_report.txt 2>&1
      

The "&1" at the end of the above line means standard output, and this will receive "2> which is standard error. This is getting quite lengthy, but a final note on syntax. You can generally insert spaces in anywhere except inside a file name or command name, which has to be a single word. So the following commands are equivalent:

$ ls -l /tmp >/tmp/fileinfo 
$ ls -l /tmp>tmp/fileinfo
      

In fact, you can put redirection anywhere on the command line, so the below is a 3rd valid and equivalent variation:

$ >tmp/fileinfo ls -l /tmp
      

However you can not break up the "2>&1" operator, or the double-redirect ">>" used for appending ouptput to existing files.

That is redirection and pipes in a nutshell. Next up is command line substitution.

Monday, June 4, 2007

Tutorial: pipes and redirection

Temporarily taken off-line as the Wysiwig editor is causing havoc with the redirection tags and the resulting article shown is inaccurate!!! Should be back up in a few minutes at most...