asciinema asciiquarium

Introduction

Note

We are going to work with Bash shell. Many commands may work on other shells (sh, zsh, fish, etc.), but I cannot guarantee anything as Bash is the only shell I am familiar with.

Tip

If you are a newborn in Linux and \*nix topics in general, don’t expect to understand much when reading documentation on the first few attempts. It may be difficult indeed and it depends a lot on your background, that is, how much you already knows (or doesn’t) about these things. Just hang in there.

We know that when dealing with command line, we have the shell (the “command interpreter”, Bash for the sake of this post), the builtin commands (those that are provided by the shell itself, and are not external executables/scripts on your system), and external programs, which are installed and are not part of the shell, but are used from the command line/shell.

Note

In this post I will sometimes use “command” in the same sense as “program”. For instance, “the ping command”, although ping is a program, and used on the command line. This my cause controversy, people many times say “command” when they mean “program”. Context should make it clear if a command actually a program.

Builtin vs external commands

Bash features many commands that are native or builtin, that is, the bash developers, besides providing us with the command interpreter, also gave us some commands that come with Bash — are internal to Bash. These are the so called builtin commands.

To know whether a command is a builtin, we make use of the type command (which is itself a builtin):

type cd
# →  cd is a shell builtin
type type
# →  type is a shell builtin

On my system/setup, ls is an alias:

type ls
# →  ls is aliased to `ls --color=auto --classify'

Still, is ls a builtin or not? To find out, you must first “remove the alias”:

unalias ls
type ls
# →  ls is /bin/ls

So, no. ls is not a builtin, but rather a program of its own, installed separately from the shell. Let’s see another such case of external (non-builtin) command:

type ping
# →  ping is /bin/ping

But there is more. Let’s see. Programs are installed somewhere on your system, and there is an enviroment variable called PATH that keeps a record of all the places where the shell should look for programs to execute. Builtin commands are, well, builtin, and are not installed “somewhere”. They come with bash. The situation is different with external commands. Suppose you invoke the command ping. She shell doesn’t know the exact location where it is. It just knows it is somewhere in my PATH. Well, the shell start scanning the list of directories stored in PATH looking for the program ping. If it is found, Bash then add the path for this specific program in a hash table, so, the next time you need to run this command, Bash doesn’t need to perform this lookup again.

As an example, here’s what PATH looks like on my Arch Linux box:

$ sed 's/:/\n/g' <<< $PATH
/home/fernando/.rvm/gems/ruby-2.3.1/bin
/home/fernando/.rvm/gems/ruby-2.3.1@global/bin
/home/fernando/.rvm/rubies/ruby-2.3.1/bin
/home/fernando/.rvm/gems/ruby-2.3.1/bin
/home/fernando/.rvm/bin
/home/fernando/.nvm/versions/node/v6.9.1/bin
/usr/local/sbin
/usr/local/bin
/usr/bin
/usr/bin/site_perl
/usr/bin/vendor_perl
/usr/bin/core_perl
/home/fernando/bin/node_modules/bin/
/home/fernando/bin
/home/fernando/.dotfiles/bin
/home/fernando/bin/android-sdk//tools
/home/fernando/bin/android-sdk//platform-tools
/home/fernando/bin
/usr/local/heroku/bin

As you see, a lot of places to look for a program. It makes sense to cache the location of a program so subsequent invocation of the same program take less time.

All that was said so I can explain the output type.

Suppose you started your amazing Linux distro today, and you didn’t use ping a single time so far. Then you do:

$ type ping
ping is /usr/bin/ping

$ ping -c 1 '::1'

$ type ping
ping is hashed (/usr/bin/ping)

See how the first time it tells us the path, whereas in the second time we do type ping the shell says it is hashed?

Another example:

# It wasn't at the hash table yet.
type epiphany
# →  epiphany is /usr/bin/epiphany

# The shell has to perform the _lookup_ (where is this epiphany thing?)
epiphany

# OK, found it and added its path to the hash table.

# Now there is no need to do the lookup again.
type epiphany
# →  epiphany is hashed (/usr/bin/epiphany)

Read more about this in the section COMMAND EXECUTION in the Bash man page.

Note

Thanks to ‘greycat’ and ‘twkm’ in #bash. They taught me about this “hash table” vs “lookup” thing.

Three types of documentation

When talking about command line, or commands, or tools and programs that are command line oriented, we basically have three sources where to read from

  • info

  • man

  • help

Not all three options are always available for any given command, though.

Man pages

Almost all installed programs have a man page.

We are going to learn how read man pages. It so happens that we first need to understand how man itself works!

man man     # Yes, man man!
man --help

And to read documentation from programs, we do:

man ls
man ping
man firefox # Nope...

Yep! Firefox, for instance, doesn’t come with a man page.

Apropos

apropos is used to find out what program could be used for a certain task. Suppose you want to find out programs that have something to do with “address”:

apropos address

Or about “print” (chars to the screen, printer, etc):

apropos print

You can do the same with man -k:

man -k address
man -k printf

To read a man page in a browser, do something like:

BROWSER=firefox man -H bash
BROWSER=midori man -H ping

Man page sections

If you run:

man -k printf

The output might be lengthy. Still, part of that output is:

printf (1)           - format and print data
printf (3)           - formatted output conversion

Those numbers represent “man page sections”. If you read man man, you know that commands are categorized by type. Excerpt from man man:

The table below shows the section numbers of the manual followed by the types of pages they contain.

1   Executable programs or shell commands
2   System calls (functions provided by the kernel)
3   Library calls (functions within program libraries)
4   Special files (usually found in /dev)
5   File formats and conventions eg /etc/passwd
6   Games
7   Miscellaneous (including macro packages and conventions),
    e.g. man(7), groff(7)
8   System administration commands (usually only for root)
9   Kernel routines [Non standard]

Combining that with the output we saw earlier, in which we saw there is printf both in section 1 and in section three of the manual, we can pick which one to read about:

man 1 printf

or

man 3 printf

No primeiro caso vamos obter a man page do comando printf, ao passo que no segundo caso obteremos documentação sobre o printf da linguagem C quando incluímos a biblioteca padrão de entrada e saída (Standard Input/Output) com #include <stdio.h>.

In the first case, we are shown the man page of the command printf (a shell builtin). In the second case, we are presented with the man page for C’s standard library printf (that one we use when we do #include <stdio.h> in a C source file.

Help pages and more about shell builtins

help is used to obtain help for builtin commands. You can invoke man bash and spend sometime just looking for the right place to read about the builtin you are interested in, or just use help.

First of all, help alone returns a list of all builtins:

help

And to read help on a specific builtin:

help <some builtin command>

Examples:

help help # Yes, help help! ☺
help cd
help alias
help exit

If you pass help a non-builtin, you get an error message.

Info

Generally, info pages are more detailed and sometimes even contain examples (although man pages, albeit rarely, may contain examples too). Info pages are, more often than not, beginner-friendly than man pages. For instance, I challenge you to learn ed from the man page. But its info page is more tutorial-like and therefore more likely that one can learn how to use the tool by reading it. It is true that man pages are intended as a reference that helps when you already know how to use a certain tool. Info pages can be, often, used to get you started with a tool or command line program.

Start with this:

info info
info --help

If you read info’s info page, you’ll see it is a somewhat complex system for documenting stuff because it has lots (I mean lots) of features to help you navigate around.

The, go ahead and look for info pages for other programs.

info <program or command>

For example:

info ls
info grep
info find
info ed
info sed

Unfortunately, not all programs provide an info page. In such cases, the info command tries to display at least the command’s man page.

info bash
info vim

Both bash an vim (at least on my system) display a man page because there is no info pages for bash and vim.

Please read info info. It will teach you, for instance, that you can open an info page directly in a specific section. Look:

info sed 'execution cycle'

It is possible to read info pages from Emacs:

C-h i m <command>

Exemplos:

C-h i m sed

Info has lots of terms, concepts and keyboard shortcuts. info info explains it all in a tutorial-like fashion that explains how to find what you need to read about, how to navigate through the pages, etc. It is a complext system, but very powerful.

Help for GUI Programs

We mentioned earlier, some GUI applications don’t come with man pages. Still, to find out if any GUI application has some sort of command line facilities, try:

program --help
program -help
program -h
program help

For example, Firefox, Chromium (the open source browser upon which Google rebrand as google chrome), Opera, Midori and some others responds to --help. Some may open a man page when you invoke its help.

Understanding man pages

Here we shall discuss how to understand the significance of the syntax displayed when you read a piece of documentation.

We’ll start with this one:

man cp

And you’ll se the following output (similar, if not exactly the same):

NAME
cp - copy files and directories

SYNOPSIS
cp [OPTION]... [-T] SOURCE DEST
cp [OPTION]... SOURCE... DIRECTORY
cp [OPTION]... -t DIRECTORY SOURCE...

Behold! All those things have meanings that require us to understand them if we are to properly understand man pages, info pages and other types of documentation.

cp is the name of the command or program. No mystery so far.

Anything inside [ and ] means that it is optional. In this case, [OPTION] means that the options for the cp command are not required. You can either do

cp -v foo.txt bar.txt

or

cp foo.txt bar.txt

In which case -v is an option (also called a modifier or command line switch). The man page says that command line options are optional (not required). You can use them if you want/need, but you can also invoke cp without passing any option whatsoever.

The three dots (or ellipsis) that come after [OPTION] or SOURCE mean that the preceding item may appear more than once. If that item is optional, ... means zero or more times, and if the item is mandatory, then the three dots mean one of more times.

Therefore, in:

cp [OPTION]... SOURCE... DIRECTORY

cp is required and must appear exactly once, followed by zero or more options, SOURCE... is required (there is no pair or [] around it) and may appear more than once. Finally, DIRECTORY is mandatory and must appear precisely once. Using that information, we are able to compose the command below.

Copy three files to the directory bkp:

cp -vi main.c net.h net.c bkp/

cp is the command/program being executed. -vi are the optional switches/options. Then main.c, net.h and net.c are all part of the SOURCE... argument (mandatory, one or more), and finally bkp satisfies DIRECTORY (required, one).

Another example

One of the most popular Scheme interpreters and REPL is Chicken. From the command line it is invoked with the command csi (Chicken Scheme Interpreter). Very well:

csi -help

Gladly tells us that:

usage: csi [FILENAME | OPTION...]

Note that we have [ …​ ] around two items, and there is a | between those items. That | is an OR, that is either tells us that we use either one or the other; either FILENAME or OPTION.... It doesn’t mean “invoke csi followed by filename then one or more options.” This interpretation is incorrect.

It means this:

csi program.scm

or this:

sci <some option>

But not this:

sci program.scm <some option>

Again, cmd [ foo | bar ] means, cmd, either followed by nothing (because both foo and bar are inside [ …​ ]), or followed by foo, or followed by bar, but never followed by foo and bar.

On the other hand, if you look at the csi man page (or sci -help), you’ll see that some options require a file name, like the -s (or -script) option. So, sci may be passed a filename or an option. Just that there is one option that requires a filename.

The moral of the story is that the man/help of a command may show something that can be easily misunderstood (it happened to a friend of mine :D).

This may lead one to think that this

csi [FILENAME | OPTION ...]

means that you can run this command

csi program.scm -s

Which is wrong. It actually means this:

sci program.scm

or (since the -script option demands a filename)

sci -script program.scm

That is, csi file or csi <option>, and then, there is an option that takes a file.

I know I repeated myself a lot in this last part, but I wanted to make sure to drive the point home. Sometimes the same thing explained in two or more different ways helps one to really grasp the concept or idea.

By the way, have you seen this? :)

And may the force and the source be always with you! <3

::::          ::::::      ::::      ::::    :::::::::
::::        ::::  ::::    ::::      ::::    :::::::::
::::       ::::    ::::   ::::      ::::    ::::
::::       ::::    ::::    ::::    ::::     ::::::::
::::       ::::    ::::     ::::  ::::      ::::
::::       ::::    ::::      ::::::::       ::::
::::::::::  ::::  ::::        ::::::        :::::::::
::::::::::    ::::::           ::::         :::::::::