Threading Using the Threads Package

One way to increase scalability is to use multiple threads to work on multiple tasks concurrently. Of course, some may argue that dedicating a process to processing a series of tasks is just as fast as doing all of the tasks in parallel. Part of that argument is correct, but then one could say that DOS would have been good enough since it was not a multi-tasking operating system. And, these days, DOS applications would probably run like a cheetah chasing its kill. However, the reality is that operating systems and programs should do things in parallel because people like to do things in parallel, like start a print job, modify a document, send some e-mail, etc. In all of those conditions, the computer can run all of those tasks concurrently without slowing down the computer. The traditional UNIX user would also make the comment that processes can act like threads and perform operations concurrently. Again, this is true, but running concurrent multiple processes has other problems, like process communications, sharing of data, and speed of thread startup and shutdown. In the end, threads are a good thing, and they increase scalability.

In the Jakarta Commons package, there are two threading packages, threading and Threadpool . The threading package has many different capabilities that assist a developer in writing asynchronous-type programs. Asynchronous programs do increase scalability because of their ability to distribute a task among computers and physical locations. However, we will discuss the threading package in more detail in Chapter 6.

The Threadpool package is a package that deals specifically with making it simpler to write multithreaded applications. In this section, we will discussing lower-level issues that improve scalability.

Technical Details for the Threadpool Package

Tables 4.3 and 4.4 contain the abbreviated details necessary to use the Threadpool package.

Table 4.3: Repository details for the *Threadpool* package.
Item	Details
CVS repository	jakarta-commons-sandbox
Directory within repository	threadpool
Main packages used	org.apache.commons.threadpool

Table 4.4: Package and class details (legend: [threadp] = org.apache.commons.threadpool).
Class/Interface	Details
[threadp].ThreadPool	The same as for the class StackObjectPool (see Table 4.2)
[threadp].DefaultThreadPool	A structural interface used by the Object pool to instantiate and manage the various object instances.

Running a Thread

Writing a multithreaded application with the thread pool is extremely simple. The thread pool expects to execute a class that subclasses the class Runnable . Listing 4.11 shows an example.

Listing 4.11

 class ExampleThread implements Runnable { public void run() { System.out.println( "a task from a thread"); } }

In Listing 4.11, the method run is executed by the thread pool and a simple output is generated. To execute this class, use Listing 4.12.

Listing 4.12

 DefaultThreadPool pool = new DefaultThreadPool(); pool.invokeLater( new ExampleThread());

In Listing 4.12, the class DefaultThreadPool is the default implementation of the thread pool. To execute a task, the method invokeLater is called. The method will add the instance of the class ExampleThread into a queue. The thread pool then retrieves the task from the queue and executes the task in the context of thread.

The thread pool in Listing 4.11 is a pool of a single thread because, when the class DefaultThreadPool is instantiated without any parameters, only one thread is created. What happens in the thread pool is that upon creation of the thread pool, a number of threads are started. The exact number depends on the constructor parameter. Once the individual threads have started, they begin pooling the task queue. If the task queue is empty, then the thread will wait for a specific time period before pooling the task queue again. If the task queue has an object instance, then the class method Runnable.run is called. The advantage of using a thread pool is that the number of threads is limited and the threads will execute tasks as they arrive . Of course, in the default case of a thread pool of one thread and two tasks, one task will be waiting, while one is being executed. The exact number of threads you should have in the thread pool depends on the situation ”feel free to experiment.

Immutable Classes Are Scalable Classes

The quality of scalable applications depends on how well the application is written. The sentence sounds too dumb to be logical, but there is quite a bit of truth to the comment. The classical example of this comment is a pregnant woman. Typically, it takes one woman, one man, and nine months to produce a baby. It is not possible under any circumstance to take nine women and one month to produce a baby. This is a fact of nature and indicates that some tasks cannot be split into multiple tasks. The problem, though, is that many people program applications as if the applications were a pregnant woman . Often, business object design is incorrect and objects are too complicated.

A scalable class is a class that uses the keyword synchronized as little as possible. The keyword synchronized blocks multiple threads from accessing the same piece of code concurrently. Listing 4.13 uses a synchronized method.

Listing 4.13

 class ExampleSynchronized { public synchronized void onlyone() { System.out.println( "Hello world"); } }

In Listing 4.13, the method onlyone allows only one thread access at a time. In this example, the method did not do much other than output some text. Some methods are more complex and will do more work. In those cases, the single method may take longer to return; in addition, if many threads are waiting, a bottleneck is created. Singletons can be bottlenecks because only a single object will typically restrict access. A factory, discussed in Chapter 3, could be a bottleneck because a factory may allow only one thread to create objects. To be sure that a bottleneck is not created, you should inspect the various sources.

The best type of class to write in a multithread and multiprocess scenario is an immutable class. An immutable class is a class that does not allow its data to be modified. The best example of an immutable class is String. The class String is an immutable class because the contents can never be changed once they are set. Consider, for example Listing 4.14, which is the concatenation of two strings.

Listing 4.14

 String a += c;

In Listing 4.14, the string buffer c is concatenated to the string buffer a. In programming terms, what should happen is that the contents of string buffer c are appended to the already existing string buffer a. The string buffer a would either be expanded or extended to include the contents of c. However, that is not what happens, as illustrated in Listing 4.15.

Listing 4.15

 String a = "Starting point "; String temp = a; String c = "ending point"; a += c; System.out.println( "before (" + temp + ") after (" + a + ")");

The string buffer a contains a specific buffer. The string buffer temp is assigned a reference to string buffer a. The purpose of string buffer temp is to be another reference to the same buffer that string buffer a points to. String buffer c is assigned some content, which is then appended to string buffer a. The class method System.out.println then outputs the content of the original buffer pointed to by string buffer a and the buffer after the concatenation of c. This is shown in Listing 4.16.

Listing 4.16

 before (Starting point ) after (Starting point ending point)

The output of Listing 4.16 is strange , because it indicates that string buffer a references a specific buffer and then another buffer after the concatenation. What has occurred is that the appended buffer is a new buffer that contains the contents of string buffers a and c. The old string buffer referenced by a is kept unmodified. This is immutability in action and it ends up being faster than expanding and appending the buffer.

Immutability is faster because it is an optimization learned by the passage of computing time. In the days of the original C and C++, a buffer was allocated from the heap. A memory manager kept track of the various pieces of memory. In the C and C++ memory model, whenever a piece of data is allocated, the memory manager had to search the heap for an appropriate piece of memory that would fit the need. This searching and slicing of the memory, however, cost many CPU cycles. When an object is immutable, the memory buffer has a binary state: it is either used or not used. The condition that the memory buffer will be expanded or modified never exists. Therefore, the memory manager can optimize operations ”although this may waste resources at times, it is faster overall.

The other reason why immutable objects are faster is that there is no need for synchronization. When multiple threads access a piece of data, synchronization is needed so that each thread manipulates stable data. If the data is always stable, then synchronization is not necessary. Therefore, immutable objects have a definite speed advantage over objects that have synchronization requirements.

Immutable objects are not that difficult to write. Listing 4.17 shows a simple example.

Listing 4.17

 final class SomeData { private final int _value; public SomeData( int value) { _value = value; } public SomeData add( int value) { return new SomeData( _value + value); } public int getValue() { return _value; } }

In Listing 4.17, there are several indicators that the class SomeData is immutable:

The class is declared with the keyword final . This means that another class cannot subclass the class SomeData .
The data members are declared private and final. This means that data members can be assigned only once within the constructor, ensuring that no other data members can change the values. The data members are private. Therefore, the class may not need to be declared final since the final data members ensure read-only functionality.
The method add, which adds two numbers together, will return a new instance of the class SomeData . Returning a new instance ensures that the original data members are not updated.
The getter getValue returns the value of the data member _value, but there is no associated setter. The lack of a setter ensures that no user of the class SomeData will unintentionally modify the value of the data members.

Ideally , an immutable class should be a data member class. For example, very few people would consider subclassing the class String . Granted, it is not possible, but even if it were, not many people would consider doing so. When the class String is used, other classes typically encapsulate the class String .

In addition, an immutable class would ideally be a data class. A data class is a class that stores and manipulates data but does typically not operate on it. Going back to the class String example in Listing 4.15, there is an associated class StringBuffer . The class StringBuffer is used to modify the string data in place and is not an immutable class. The difference between using the String class and the StringBuffer class is that scalability may be an issue when you use the class StringBuffer . The separation of a scalable class from a less scalable one makes it easier for programmers to write efficient code.

There is a downside to immutable classes in that they can be resource intensive . For example, writing objects that use the operators of the immutable class may require constant object instantiation. As noted earlier, the memory manager will be able to optimize, but there are limitations. If a class allocates four megabytes with each instantiation, then that instantiation will become costly. In that situation, the solution would be to use pooled objects. However, using pooled objects requires that you rewrite Listing 4.17 so that it appears like Listing 4.18.

Listing 4.18

 private class PooledSomeData { private int _value; private ObjectPool _pool; public PooledSomeData() { } public void assign( ObjectPool pool, int value) { _pool = pool; _value = value; } public void reset() { _pool = null; _value = 0; } public PooledSomeData add( int value) throws Exception { PooledSomeData cls = (PooledSomeData)_pool.borrowObject(); cls.assign( _pool, _value + value); return cls; } public int getValue() { return _value; } }

In Listing 4.18, the class PooledSomeData has some fundamental changes. The keyword final has been removed because the object instance needs to be reused, which is not possible when you use the keyword final. The exception, of course, is the final used in the class declaration. We've also changed the constructor from one with parameters to one with none. We need to do this because the pool will allocate the class instance and not the method add, like in Listing 4.17. We have replaced the constructor with the method assign so that we can initialize the class instance. The method assign has an extra parameter, pool , which is used to borrow objects that will be assigned. We have also added the method reset to initialize the state of the object, which is required when the object pool activates an object.

The method add in Listing 4.18 has changed dramatically. The method add is still immutable, but instead an object being allocated directly, the class method _pool.borrow- Object is called to get a pool instance. Then, the class method cls.assign is called to assign a state to the object instance. Finally, the class instance is returned.

In Listing 4.18, the rewritten class is still immutable even though pooled objects are used. Pooled objects are the ideal solution for data objects that have multiple data members and more complex structures. With pooled objects, the advantages of having larger immutable complex objects are possible. Immutable objects have an additional advantage in that they represent a consistent state. The method add in Listings 4.17 and 4.18 is an operation that assigns a state that can be tested and validated . If the validation is not successful, it is very simple to pinpoint where the error occurred. This reduces debugging time and simplifies class maintenance.