CO307: Software Design & Implementation

Coursework 1: Pipes and Filters

This filter program reads a text file representing the source code of a program and analyzes the use of identifiers. This is achieved through the use of a series of filter programs which each perform a small step of the overall task.

Since this task basically consists of finding all words in one file which are not listed in another file, it is conceivable that a similar operation might be carried out using different kinds of data files. Thus the emphasis was on ensuring that each filter is an independent utility which could be reused in a different task.

Most of the filters would process normal human language words just as well as programming language identifiers. However, since identifiers are often permitted to contain digits and underscore characters, some specialization was required for this specific task. The 'words' filter performs the task of extracting identifiers from the source file and thus contains the specialized code. A simple modification to this filter would allow human language words to be processed (see below).

The task breaks down into the following major steps:

Extract all potential identifiers from the source file
Search a keyword index file for each potential identifier
List all words not found in the index file (ie. non-keywords)
Count the occurrences of each word in the list and remove duplicates

The filters are arranged to create the following program structure:

        pasid
        |   |
    lower   nonindex
               |
            nonwords
             |    |
          words  noword

The 'pasid' filter first translates the source file into lowercase by passing it through the 'lower' filter. Since Pascal is case-insensitive and the pascal keyword index file is assumed to be all in lowercase, this initial filter ensures that the keywords are matched. To process a case-sensitive language like C, the program could simply skip this initial step.

The output from 'lower' is piped into the 'nonindex' filter. The 'pasid' filter also passes to 'nonindex' one parameter specifying the name of the Pascal keyword index file.

The 'nonindex' filter compares every word from the standard input with the contents of a named index file and outputs a list of unmatched words with the number of occurrences of each. It first passes the standard input and the name of the index file to the 'nonwords' filter, which creates a raw list of unmatched words. It then sorts the list using 'sort' and removes duplicate words (and adds counters) using 'uniq'.

The 'nonwords' filter passes its standard input through the 'words' filter to obtain a delimited list of words. It then takes each word in turn from this list and passes it to the 'noword' filter, which searches for the word in the named index file. Any words not found in the index are written to the standard output.

The 'words' filter outputs a delimited list of all strings from standard input that match the pattern of a programming language identifier (a letter or underscore optionally followed by any combination of letters, digits and underscores). It does this by first replacing any non-alphanumeric characters with newline characters and then removing any strings which begin with a digit.

By extracting words in this way, the 'words' filter is more specific to this particular task than are the other filters, in that it could not be used to identify human language words. To allow such use, the filter could be rewritten using a line such as the following:

tr -cs a-zA-Z '[\n*]'  # Translate all non-alphabetics to newlines

The 'noword' filter searches the standard input for a specified word. If the word is not found, it is written to the standard output.

The 'lower' filter simply translates any uppercase characters from the standard input into lowercase.



FILE: pasid



# Count and list all identifiers in a named Pascal source file
# Usage:  pasid sourcefile  [ > outputfile ]
# Display info and terminate if no sourcefile parameter was specified

if test -z "$1"
then
  echo "Usage:  pasid sourcefile  [ > outputfile ]"
  exit
fi

# Use the (lowercase) Pascal index to find non-keywords in sourcefile

lower < $1 | nonindex pasindex


FILE: nonindex

# Count and list words from standard input not found in a named index file
# Usage:  nonindex indexfile  [ < inputfile ]  [ > outputfile ]
# Display info and terminate if no indexfile parameter was specified

if test -z "$1"
then
  echo "Usage:  nonindex indexfile  [ < inputfile ]  [ > outputfile ]"
  exit
fi

nonwords $1 | sort | uniq -c      # Sort and count unlisted words



FILE: nonwords

# Output words from standard input that are not found in a named index file
# Usage:  nonwords indexfile  [ < inputfile ]  [ > outputfile ]
# Display info and terminate if no indexfile parameter was specified

if test -z "$1"
then
  echo "Usage:  nonwords indexfile  [ < inputfile ]  [ > outputfile ]"
  exit
fi

for word in `words`       # Read words from the standard input filter
do                        # For each word...
  noword $word < $1       # Write the word if not found in the index file
done                      # ... until all words have been processed



FILE: noword

# Write word to standard output if not found in standard input
# Usage:  noword word  [ < inputfile ]  [ > outputfile ]
# Display info and terminate if no word parameter was specified

if test -z "$1"
then
  echo "Usage:  noword word  [ < inputfile ]  [ > outputfile ]"
  exit
fi

if ! grep -w $1 > /dev/null      # Is the word in the standard input?
then
  echo $1                        # If not, write to standard output
fi



FILE: words

# Write identifiers from standard input to standard output (one per line)
# Usage:  words  [ < inputfile ]  [ > outputfile ]
# Remove non-alphanumerics and then non-identifiers

tr -cs a-zA-Z0-9_ '[\n*]' | grep -w "[a-zA-Z_][a-zA-Z0-9_]*"



FILE: lower

# Write text from standard input to standard output in lowercase
# Usage:  lower  [ < inputfile ]  [ > outputfile ]

tr A-Z a-z  # Translate all uppercase to lowercase



FILE: pasindex

and					library
asm					mod
array					nil
begin					not
case					object
const					of
constructor				or
destructor				output
div					packed
do					procedure
downto					program
else					record
end					repeat
eof					set
exports					shl
file					shr
for					string
function				then
goto					to
if					type
implementation				unit
in					until
inherited				uses
inline					var
input					while
interface				with
label					xor



FILE: test.pas

program reverse( input, output);
procedure outputInReverse;
  var n : integer;
  begin
   if not eof then begin
     readln(n);
     outputInReverse;
     writeln(n)
    end
   else
    writeln
  end;
begin
  outputInReverse
end.

Go To: CO307 Coursework