CO307: Software Design & Implementation


Coursework 1: Pipes and Filters


This filter program reads a text file representing the source code of a program and analyzes the use of identifiers. This is achieved through the use of a series of filter programs which each perform a small step of the overall task.

Since this task basically consists of finding all words in one file which are not listed in another file, it is conceivable that a similar operation might be carried out using different kinds of data files. Thus the emphasis was on ensuring that each filter is an independent utility which could be reused in a different task.

Most of the filters would process normal human language words just as well as programming language identifiers. However, since identifiers are often permitted to contain digits and underscore characters, some specialization was required for this specific task. The 'words' filter performs the task of extracting identifiers from the source file and thus contains the specialized code. A simple modification to this filter would allow human language words to be processed (see below).

The task breaks down into the following major steps:

  1. Extract all potential identifiers from the source file

  2. Search a keyword index file for each potential identifier

  3. List all words not found in the index file (ie. non-keywords)

  4. Count the occurrences of each word in the list and remove duplicates

The filters are arranged to create the following program structure:

        pasid
        |   |
    lower   nonindex
               |
            nonwords
             |    |
          words  noword
The 'pasid' filter first translates the source file into lowercase by passing it through the 'lower' filter. Since Pascal is case-insensitive and the pascal keyword index file is assumed to be all in lowercase, this initial filter ensures that the keywords are matched. To process a case-sensitive language like C, the program could simply skip this initial step.

The output from 'lower' is piped into the 'nonindex' filter. The 'pasid' filter also passes to 'nonindex' one parameter specifying the name of the Pascal keyword index file.

The 'nonindex' filter compares every word from the standard input with the contents of a named index file and outputs a list of unmatched words with the number of occurrences of each. It first passes the standard input and the name of the index file to the 'nonwords' filter, which creates a raw list of unmatched words. It then sorts the list using 'sort' and removes duplicate words (and adds counters) using 'uniq'.

The 'nonwords' filter passes its standard input through the 'words' filter to obtain a delimited list of words. It then takes each word in turn from this list and passes it to the 'noword' filter, which searches for the word in the named index file. Any words not found in the index are written to the standard output.

The 'words' filter outputs a delimited list of all strings from standard input that match the pattern of a programming language identifier (a letter or underscore optionally followed by any combination of letters, digits and underscores). It does this by first replacing any non-alphanumeric characters with newline characters and then removing any strings which begin with a digit.

By extracting words in this way, the 'words' filter is more specific to this particular task than are the other filters, in that it could not be used to identify human language words. To allow such use, the filter could be rewritten using a line such as the following:

tr -cs a-zA-Z '[\n*]'  # Translate all non-alphabetics to newlines
The 'noword' filter searches the standard input for a specified word. If the word is not found, it is written to the standard output.

The 'lower' filter simply translates any uppercase characters from the standard input into lowercase.



FILE: pasid # Count and list all identifiers in a named Pascal source file # Usage: pasid sourcefile [ > outputfile ] # Display info and terminate if no sourcefile parameter was specified if test -z "$1" then echo "Usage: pasid sourcefile [ > outputfile ]" exit fi # Use the (lowercase) Pascal index to find non-keywords in sourcefile lower < $1 | nonindex pasindex
FILE: nonindex # Count and list words from standard input not found in a named index file # Usage: nonindex indexfile [ < inputfile ] [ > outputfile ] # Display info and terminate if no indexfile parameter was specified if test -z "$1" then echo "Usage: nonindex indexfile [ < inputfile ] [ > outputfile ]" exit fi nonwords $1 | sort | uniq -c # Sort and count unlisted words
FILE: nonwords # Output words from standard input that are not found in a named index file # Usage: nonwords indexfile [ < inputfile ] [ > outputfile ] # Display info and terminate if no indexfile parameter was specified if test -z "$1" then echo "Usage: nonwords indexfile [ < inputfile ] [ > outputfile ]" exit fi for word in `words` # Read words from the standard input filter do # For each word... noword $word < $1 # Write the word if not found in the index file done # ... until all words have been processed
FILE: noword # Write word to standard output if not found in standard input # Usage: noword word [ < inputfile ] [ > outputfile ] # Display info and terminate if no word parameter was specified if test -z "$1" then echo "Usage: noword word [ < inputfile ] [ > outputfile ]" exit fi if ! grep -w $1 > /dev/null # Is the word in the standard input? then echo $1 # If not, write to standard output fi
FILE: words # Write identifiers from standard input to standard output (one per line) # Usage: words [ < inputfile ] [ > outputfile ] # Remove non-alphanumerics and then non-identifiers tr -cs a-zA-Z0-9_ '[\n*]' | grep -w "[a-zA-Z_][a-zA-Z0-9_]*"
FILE: lower # Write text from standard input to standard output in lowercase # Usage: lower [ < inputfile ] [ > outputfile ] tr A-Z a-z # Translate all uppercase to lowercase
FILE: pasindex and library asm mod array nil begin not case object const of constructor or destructor output div packed do procedure downto program else record end repeat eof set exports shl file shr for string function then goto to if type implementation unit in until inherited uses inline var input while interface with label xor
FILE: test.pas program reverse( input, output); procedure outputInReverse; var n : integer; begin if not eof then begin readln(n); outputInReverse; writeln(n) end else writeln end; begin outputInReverse end.

Go To: CO307 Coursework