The Unix Tools Are Your Friends

    First, IDEs target specific languages, while Unix tools can work with anything that appears in textual form. In today’s development environment where new languages and notations spring up every year, learning to work in the Unix way is an investment that will pay off time and again.

    Furthermore, while IDEs offer just the commands their developers conceived, with Unix tools you can perform any task you can imagine. Think of them as (classic pre-Bionicle) Lego blocks: You create your own commands simply by combining the small but versatile Unix tools. For instance, the following sequence is a text-based implementation of Cunningham’s signature analysis — a sequence of each file’s semicolons, braces, and quotes, which can reveal a lot about the file’s contents.

    Unix tools were developed in an age when a multiuser computer had 128kB of RAM. The ingenuity that went into their design means that nowadays they can handle huge data sets extremely efficiently. Most tools work like filters, processing just a single line at the time, meaning that there is no upper limit in the amount of data they can handle. You want to search for the number of edits stored in the half-terabyte English Wikipedia dump? A simple invocation of

    will give you the answer without sweat. If you find a command sequence generally useful, you can easily package it into a shell script, using some uniquely powerful programming constructs, such as piping data into loops and conditionals. Even more impressively, Unix commands executing as pipelines, like the preceding one, will naturally distribute their load among the many processing units of modern multicore CPUs.

    Finally, if none of the available tools match your needs, it’s very easy to extend the world of the Unix tools. Just write a program (in any language you fancy) that plays by a few simple rules: Your program should perform just a single task; it should read data as text lines from its standard input; and it should display its results unadorned by headers and other noise on its standard output. Parameters affecting the tool’s operation are given in the command line. Follow these rules and “yours is the Earth and everything that’s in it.”

    By Diomidis Spinellis