Java: A simple, object-oriented, distributed, interpreted, robust, secure, architecture neutral, portable, high-performance, multithreaded, and dynamic language.
One of the first things noticed when migrating from C to Java is the complete absence of a global scope for all code and data. Instead, the scope of code and data is limited to the class or object in which it is contained. This may seem an unhelpful restriction at first, especially to a C or Pascal programmer accustomed to using the global namespace for all functions and for generally available static data. It is not, however, quite as restrictive as it at first seems and is, in fact, more of a help than a hinderance. Though the scope of methods and variables in a class is limited to that class, the class itself occupies the global namespace and so can be accessed from any part of an application. Thus a public method or variable can be accessed via its containing class and therefore can be, in effect, globally accessible.
The net result of this change is to simply to remove a lot of unnecessary clutter from the global namespace and thereby improve the performance of the Java interpreter. By using a single class name as a global key to a whole group of methods and variables, the number of entities to be searched during the dynamic- lookup process is significantly reduced.
It could perhaps be argued, from an object-oriented perspective, that the whole notion of global variables is really just a relic from the days of the early BASIC language interpreters, which provided no facility to create sub-procedures or functions and no local code or data of any kind. Most BASIC interpreters include the GOSUB and RETURN keywords to allow the creation of sub-routines. These, though, are not like functions or procedures in the modern sense, since they do not create a 'local-code' context; the code they contain is not seperated from the main program in any language-defined way, they can be entered or exited at any arbitrary point, and they cannot declare local variables. Each line of a sub-routine could therefore be considered to be within global scope. The DEF FNa(x) (define function) keyword included in some implementations of BASIC provides a means to create functions, but these functions are restricted to performing a single calculation using variables or other FNa(x) functions in order to return a value. With no way to define general-purpose, localized code sub-blocks, everything in a BASIC program is global and the entire program represents a single, unified, complex object.
The introduction of the modular principle exemplified by the Pascal language began a paradigm-shift, expressed through languages like Modula2, C and C++, towards the object-oriented principle. The 'absolutist' virtual machines created by the early languages, which ran a single task and assumed full access to system resources, have now been replaced by 'relativist' systems like Java Virtual Machine, with its multithreaded, dynamic-object-processing operating system. A complete application can now be constructed from a collection of re-usable and interchangeable objects.
Since the development of modularisation, the bell has been tolling for global scope and the implementation localized functions (or methods) is just the logical extension of the modular paradigm. The table below illustrates the move towards modularisation and away from globalisation in programming language methodology. It shows how the Java language represents a harmonization of the modular (or 'object- oriented') principle.
Language Data Scope Code Scope Fn Scope Class Scope BASIC global global n/a n/a C/Pascal local/global local global n/a Java local local local global KEY: Data Scope The availability of variables and static data. Code Scope The availability of individual lines of code. Fn Scope The availability of functions or procedures. Class Scope The availability of a containing class.
On learning that Java is considered to be an interpreted language, it may be slightly surprising to discover that Java programs must be compiled before execution. Java represents a compromise between the performance of compiled languages like Pascal and C, and the reliability and portability of fully-interpreted scripting languages such as the Unix shells. Java programs are compiled into an intermediate byte-code format designed to operate a 'virtual machine' (the Java Virtual Machine or JVM). The Java interpreter simulates the operation of this virtual device to execute the byte-code program.
In a sense, these byte-code files are similar to the object-code files produced by C compilers (eg. the a.out file on Unix systems and .OBJ files on DOS or Windows systems), in that they each represent a self- contained block of code which can be combined with other blocks to build an application. The important difference with Java byte-code files is that they do not require a pre-execution link-phase to combine separate blocks to build the final application. The interpreted nature of Java means that there is no need to build a 'final' application file, since the functionality of an application is distributed amongst a collection of class objects which are dynamically loaded as required at run-time by the Java interpreter. In practice, however, the number of class files comprising a Java application can become quite large and copying applications between disks or across networks can become inefficient due to the number of file-copy operations required. To remedy this, the class files for an application can be combined into a Java Archive (or JAR) file, a single compressed file containing the entire application.
One consequence of the use of the JVM is that, as with all interpreted languages, there is a loss of performance due to the need to convert code into the native language of the host machine. In the case of Java, this problem is addressed by the use of a 'just-in-time' (or JIT) compiler, which converts JVM byte- codes into native machine code prior to execution of a program. By allowing a JIT compiler to perform this final optimization for the host-system at run-time, portability can be maintained whilst performance can be improved to almost the same level as native C code*.
[* Source: Java In A Nutshell (p.8) by David Flanagan, O'Reilly publishers]
An obvious advantage of the use of the JVM is the standardisation of the software-development platform, allowing software to be used on different hardware-platforms with little or no adaptation. This is particularly useful in the context of computer networks such as the Internet, where many different systems access a shared data source and it is desirable to develop software which can be used on all systems.
On the negative side, the introduction of the 'protective layer' of the JVM between the developer and the host-system prevents the use of code-optimizations which involve directly accessing the hardware or the operating system. However, since such optimizations are one of the main sources of the type of system- dependent code that reduces portability, and since the current wave of computer technology is powerful enough to make such optimization redundant in any case, the net result is simply better code.
The ability to create pointers to functions is one of the most useful aspects of the C language. It allows the choice of a function to execute to be data-dependent, that is, determined by the values of variables at run-time. For example, a group of function pointers of the same type can be collected into an array, allowing the selection of alternative functions using an index value. This could be used to the implement the selection of an item in a menu system. Function pointers can also, like any other pointer, be passed to a function as parameters. This feature is used by the C language qsort() function, which sorts an array of values using the 'quick-sort' method. It requires a pointer to a support function (strcmp() or stricmp(), for example) which is called to perform the comparisons during the sorting process.
Since Java does not have pointers, there can be no pointers to methods. It does, however, provide alternative ways to implement the useful features of function pointers. Instead of an array of method pointers, Java allows an array of class or object references to be created (a class-reference refers to a Java class type and can be used to create an instance of that class, whereas an object-reference referes to an actual instance of a class). The methods to be called in each alternative class must be defined by an 'interface', which declares the names, return-value types and parameter types of methods to be implemented. The name of the interface is used as the type for the elements of the array, thereby confoming to the requirement that each element of an array is of the same type. A method in one of the classes or objects in the array can then be selected for execution by an index value, as in the C language. The main difference between the two methods, as far as the developer is concerned, is that, instead of a group of alternative functions of the same type but with different names, the Java system uses alternative methods that have both the same name and type, which are contained by a group of different classes that all implement the same interface.
It may seem that the Java method is a trifle long-winded and rather like using a sledgehammer to crack a nut. Certainly it would seem that declaring an interface and creating an entire object to contain each alternative method represents a lot more work than typing an ampersand (&) to get the address of a function. It does, however, have significant advantages over the pointer method which far outweigh the minor drawback of adding extra code. Using an interface to define the acess methods of each class in a group clearly defines the way that the group is used and allows each class to explicitly declare its membership of a group by implementing the interface. This enhances the readability of the source code. Also, using references to entire objects rather than individual functions is not so much overkill as a logical extension of object-oriented functionality, since it allows a whole set of related methods and data (ie. an object) to be passed by a single reference.
Many aspects of the C language specification, from which much of the power of the language is derived, are missing from the Java language. In most cases this is simply because the need for them is eliminated by the implementation of the object-oriented paradigm. For example, the struct and union data structures are no longer necessary, since a class is essentially the same thing as a struct and the features of a union (representing the same data in different ways) can be simulated by using subclasses.
One omission that soon becomes apparent to a C programmer learning Java is that of the C preprocessor. This is responsible for resolving constant definitions and macros, importing declarations from header files and controlling the process of compilation.
In the C language, a macro is created by the #define directive and is basically a string which is replaced by another string before compilation. It allows a sequence of instructions or calculations to be represented using an abbreviated form and in this way it is similar to a function. Unlike a function, however, the code contained by a macro is inserted directly into the compiled code at every point at which the macro is used and thus executes very efficiently, without the overhead of creating a proper function (push program- counter, call address, pop program-counter, etc.). A macro can also be passed values as parameters, which are inserted into the body of the macro prior to compilation. Java has no facility to create macros, but, with advances in computer and compiler technology, this kind of optimization is now rarely necessary.
Constants in Java are represented by 'static final' variables which, like all Java variables, must be members of a class. They are 'static' to indicate that they are class variables, not variables which belong to any particular instance of a class, which means that they can be accessed at all times using a reference to their class. They are 'final' to indicate that their values cannot be modified. This is the only way to assign constant values to tokens in Java; there is no way to implement the 'enum' keyword of the C language, which declares a sequence of tokens with automatically incrementing values.
Since Java requires all methods to be explicitly declared with all parameter and return types and also allows forward references to methods, there is no requirement to have separate method declarations like those in C header files and thus no need for the #include directive. To a C programmer, the 'import' keyword used in Java will look strikingly similar to the #include directive, but is used for a quite different purpose. Whereas the C pre-processor inserts the text of the files specified by an #include directive directly into the source code prior to compilation, the 'import' keyword simply specifies a prefix to be tried when resolving class references given without a fully-qualified package name.
The condition-compilation directives provided by the C pre-processor (#if, #ifdef, etc.) have no equivalent in the Java language. Their functionality can be simulated by careful use of the different types of commenting methods that Java supports. If the C++ double-slash comment style (eg. // Comment) is used for all normal comments, the C language slash-star comment style (eg. /* Comment */) can be used to 'comment-out' sections of code. This has much the same effect as using the conditional-compilation directives, though perhaps not quite the same elegance.
The loss of pointers might appear to be a great sacrifice for the experienced C programmer moving to Java. Pointers are so commonly used in C that it may seem hard to imagine programming without them. However, the pointer concept (indirect access to data, or 'indirection') has not really been lost at all, but has in fact mutated into the concept of the 'reference'. In Java, all variables, except those that represent the primitive data types (int, char, etc.) are references to objects, which means, in C terms, that they are all pointers. It is not possible in Java to directly access the actual address of data in memory, as is done in C using the ampersand (&) operator. Since there is only one type of reference in Java, there is no need to distinguish between pointers and addresses and thus one of the most potentially confusing aspects of the C language is eliminated.
One casualty of the translation from C to Java that should not be missed too greatly is the C language 'goto' statement. Good programming practice dictates that 'goto' statements should be used very sparingly, if at all. With some extra innovations to support structured program-branching, such as labelled 'break' and 'continue' statements and full support for exception-handling, Java finally puts to rest the root cause of the undecipherable spaghetti-code created using languages like BASIC.
The Java garbage-collector is a gift to any C programmer who has struggled with the memory-management of a large application. The garbage-collector allows objects to be created as required without having to worry about house-keeping tasks such as memory allocation and de-allocation and the problems with this that so often occur in C programs. It does this by searching for objects in memory which are no longer referenced by any other active objects (garbage) and automatically returning the memory they occupy to the global memory-pool. This process runs continuously in the background on a low-priority thread and so is completely transparent to an application, leaving it free from a whole range of troublesome memory- allocation and pointer related bugs.
Another problem that can easily occur in a C program is an array-bounds error, where an array index references a memory location that is outside the declared bounds of the array. The C language was intended to be highly flexible and to allow programmers to more or less as they pleased within the syntax of the language. It therefore provides no bounds-checking for array indices, which can sometimes lead to some very difficult-to-find bugs. Since Java does not allow direct access to memory, there is little point in attempting to access memory outside the bounds of an array since there is no way of knowing what is actually there, as is possible with languages like C that allow programmers to specify explicitly the layout of data in memory. The Java compiler can therefore trap array-bounds errors without compromising the functionality of the language.
The exception-handling mechanism provided by the Java language is another welcome innovation for C programmers, since it provides a formalized and well-structured mechanism for transferring control to a different point in an application. In C this could be crudely implemented by the longjmp() function, which performs a long-jump to a previously designated program location. The Java exception mechanism, implemented using the 'try', 'catch', 'finally' and 'throw' keywords, represents a vast improvement over the C system since it allows exception-handling code to be clearly identified and to be seperated from the main application code.
Because the Java system handles memory-management itself and prevents applications from directly accessing memory, there is no danger of an application inadvertently corrupting any of the code or data stored in memory. The 'sandbox' system-model used for applets takes this security a step further by imposing restrictions on an applet's access to system resources. These security measures help to prevent the sort of program crashes that are caused by carelessly or maliciously written C code.
Programmers used to the performance of fully-compiled languages like C and Pascal might feel that the performance of Java, about 10-20 times slower than C*, is actually rather low. However, the performance of Java must be seen in the context of a compromise between performance and portability, since the performance of Java is better than other equally-portable languages.
Like the exception-handling mechanism, the ability to create multiple threads of execution in a clear, standardized manner was a feature sadly missing from the C language specification.