Since I was laid-back recently, today only a small feature is discussed. Comparing the API Document of the utility class
Arrays in J2SE 7 and J2SE 8, several static overloading methods are added:
spliterator(T array) and
parallelSort(T, Comparator<T>). The
stream(T) method has been discussed in the previous article, and it wraps an array as a stream object.
Both the methods
setAll(T, IntFunction<T>) and
parallelSetAll(T, IntFunction<T>) provide the same functionality: using the return value of
IntFunction to initialize an array. The
IntFunction accepts one integer as parameter, i.e., the index in the array, and returns a value based on the index. These two methods are useful to generate a sequence of numbers, e.g., arithmetic sequence, geometric sequence, etc. The difference between these two methods is the later one processes the array elements in parallel that make the later one faster (but with more resources).
parallelPrefix(T, BinaryOperator<T>) uses the previous element (as the left value of the binary operator) and the current indexed element (as the right value of the binary operator) to calculate the return value for the indexed element (something interesting is that
Arrays does not provide non-parallel method). Since the return value is updated to the original array, the left value of the binary operator is the updated value. The example shown in the offical document is that given an array [2, 1, 0, 3] and a
BinaryOperator<T> to perform addition, the resulting array is [2, 3, 3, 6]. 6 is calculated from the third element (left value) and the forth element (right value) and replaced the forth element.
spliterator(T array) method is used to split the array. I guess that when
parallelSort(T, Comparator<T>) is called or the
parallel() method of the stream object returned by the
stream(T array)is called, the internal implementation uses
spliterator(T) method to obtain the
Spliterator object that manages of array separation for parallel processing.
The new added methods of
Arrays are almost discussed and the left one is the today's topic:
parallelSort(T, Comparator<T>). The
Arrays class provides methods for sorting for many years. These methods use merge sort algorithm to sort the given array, and the only needed is the
Comparator to determine the ordering of two objects. The new added methods
parallelSort(T, Comparator<T>) sort elements (still with merge sort algorithm) in parallel. Therefore, I want to know how much performance can the parallel version improve?
As shown in Code List 2, an interface
SortStrategy<T> is declared first. Then, in Code List 3 and Code List 4, the
SimpleSortStrategy implementation provides the non-parallel sorting, and the
ParallelSortStrategy implementation provides the parallel sorting, respectively. And in Code List 5, a method called
generateRandomValues(int) that can generate a given sized array full-filled with random values.
All preparation is completed to examine the performance difference between the non-parallel and parallel sorting. In Code List 6, through the argument setting, the program can run many times with the specified initial array size. Each run, the array size is doubled, so we can discover the relationship between the array size and improvement, e.g., the more elements to sort, the more improvement can make. Since the amount of comparison and swapping affects the sorting time, Code List 6 uses
generateRandomValues(int) to generate a list full-filled with random values, and before each strategy starting to sort, the list is transformed to an array (the transformation time is not included) to ensure that each strategy sorts the same random sequence. The environment is similar to the previous experiment Lazy Evaluation & Parallel Stream. The only difference is that the JVM is upgraded from 1.8.0-b132 to 1.8.0_05-b13.
The result is listed in Table 1. The test run seven times, the array size is from 1 M to 64 M, but the seventh test (64 M) failed because of the
OutOfMemoryError. Therefore, Table 1 only lists six records.
Table 1 - The experiment result
|Strategy||1 M||2 M||4 M||8 M||16 M||32 M|
From Table 1, it is obvious that
parallelSort(T, Comparator<T> can improve about 2.5 times to almost 3 times of performance. Therefore, to sort huge size of data, it is recommended to use