over 4 years ago

Java Collection framework is a well-designed framework even without the Stream API. However, sometimes, writing a simple function, for example, finding an object in a collection based on some conditions, needs a for loop. Writing such a program is easy, but writing similar for-loop many times is boring. Before having the "for in" syntax surgar, the index calculation in a for loop is annoying. In addition, if the indices naming is not appropriate, debuging in a nested for loops is a terrible job. After using Apache Commons Collections, Apache Commons Collections has become the necessary library in my every project. Here is a simple example, to find an Person object in a List whose first name or last name matches the given value, the traditional way is like Code List 1 -- writing a for loop to check every person's first name and last name.

Code List 1 - Find an object in a list by using for loop
public Person findFirstPersonByForLoop(List<Person> persons, String firstOrLastName) {
    for (Person person : persons) {
        if (person.getFirstName().equalsIgnoreCase(firstOrLastName) ||
            person.getLastName().equalsIgnoreCase(firstOrLastName)) {
            return person;
        }
    }
    return null;
}

How about using Apache Commons Collections? The program is lised in Code List 2.a, and basically, a for-loop is not needed. Just call the find method with an object that implements the Predicate interface. Code List 2.b shows the implementation of the object. Only the evaluate method is required to implement -- return true if the given object matches the condition. That's all. What!? The lines of code become more than that of Code List 1. Yes. It does, but the PersonNamePredicate is reusable and easy to test. In the CollectionUtils class, there are 12 methods that use Predicate object to filter, select, or count objects in a collection. Therefore, I think it is worth writing the class.

Code List 2.a - Find an object in a list by using Apache Commons Collections
public Person findFirstPersonByCommonsCollections(List<Person> persons, String firstOrLastName) {
    return CollectionUtils.find(persons, new PersonNamePredicate(firstOrLastName));
}
Code List 2.b - The implementation of PersonNamePredicate
public class PersonNamePredicate implements Predicate<Person> {

    private String _searchCondition;

    public PersonNamePredicate(String condition) {
        _searchCondition = condition;
    }

    @Override
    public boolean evaluate(Person person) {
        return  person.getFirstName().equalsIgnoreCase(_searchCondition) ||
                person.getLastName().equalsIgnoreCase(_searchCondition);
    }
}

Well, the topic of this article is the new Stream API in Java 8. So how to find an object with the Stream API? The answer is shown in Code List 3. The lines of code are not reduced much. However, in comparison with Code List 1, Code List 3 can be interpreted as filter the objects out based on a condition and return the first one if it exists; otherwise return null and the detail of the loop is ignored. So in semantic or readability, does this way improve the level of abstraction?

Code List 3 - Find an object in a list by using Java Stream API
public Person findFirstPersonByStream(List<Person> persons, String firstOrLastName) {
    Optional<Person> result = persons.stream()
            .filter(p -> p.getFirstName().equalsIgnoreCase(firstOrLastName) ||
                    p.getLastName().equalsIgnoreCase(firstOrLastName))
            .findFirst();
    return result.isPresent()? result.get() : null;
}

Is it possible to find an object like Code List 2.a, but with Stream API? Yes, it is possible. First, write a helper class StreamUtils like Code List 4.a which provides a method find(Collection, Predicate). Second, revise the PersonNamePredicate as Code List 4.b, and then use just one line of code to find an object like Code List 4.c. Of course, if you do not want PersonNamePredicate to support both Apache Commons Collections and Java Stream API, the test method of java.util.Predicate is the only method required to implement. What is the advantage to write so many codes? Besides using parallelStream() as Code List 4.a may bring the advantage of parallel processing, this way does not bring much advantages. This reason is that the application (finding an object) is very simple, and using the Stream API is overkill.

Code List 4.a - The find method of StreamUtils
public static <T> T find(Collection<T> container, Predicate<T> predicate) {
    Optional<T> result = container.parallelStream().filter(predicate).findFirst();
    return result.isPresent()? result.get() : null;
}
Code List 4.b - The revised PersonNamePredicate
public class PersonNamePredicate implements org.apache.commons.collections4.Predicate<Person>,
    java.util.function.Predicate<Person> {

    private String _searchCondition;

    public PersonNamePredicate(String condition) {
        _searchCondition = condition;
    }

    @Override
    public boolean evaluate(Person person) {
        return  person.getFirstName().equalsIgnoreCase(_searchCondition) ||
                person.getLastName().equalsIgnoreCase(_searchCondition);
    }

    @Override
    public boolean test(Person person) {
        return evaluate(person);
    }
}
Code List 4.c - Find an object in a list by using customized StreamUtils
public Person findFirstPersonByStreamUtils(List<Person> persons, String firstOrLastName) {
    return StreamUtils.find(persons, new PersonNamePredicate(firstOrLastName));
}

The concept of Java Stream API is similar to the concept of Unix Pipeline or pipes and filters design pattern -- concatenating several simple operations to complete a meaning job. Since the operation is very simple, usually, using Lambda expression is concise and can improve the readability. As shown in Figure 1, Java Stream can concatenate serveral intermediate operations, and in the end, only one terminal operation as a pipeline. The intermediate operation is used to transforma the content of the stream, e.g., filtering (filter(Predicate)), mapping (map(Function), sorting (sorted(Comparator)), etc. And the terminal operation is used to produce the final result from or perform side effect on the content of the stream, e.g., collecting (collect(Collector)), applying something for each (forEach(Consumer)), or reducing (Reduce(BinaryOperator)), etc.

Figure 1 - Stream Pipeline

For example, a pipeline like Figure 2 can be used to summarize the assets of the rich persons who have assets of over 1 billion dollars. First, the filter(Predicate) filters out the persons who have assets of over 1 billion dollars. Then, the map(Function) extracts the value part of the assets. Finally, the reduce(BinaryOperation) aggregates values as the result. In fact, these similar operations are frequently used. Therefore, in Java Stream API, the Collectors class provides frequently-used terminal operations, e.g., summarizingDouble(ToDoubleFunction) combining a map(Function) intermediate operation and a terminal operation reduce(BinaryOperation) with the default implementation to simplify the composition of a pipeline.

Figure 2 - Stream Pipeline Example

The example is not concrete enough? One more concrete example. Assume that Exam represents a kind of examination, and a person can take an examination many times. Thus, in Person, a List is used to keep all examinations taken by the examinee. How to get the rank of the examinees whose score was more than 700 in any taken examination? To eliminate duplicated code, the getHighestScore() method like Code List 5 is added into Person to get the highest score in the taken examination (using the Stream API, too).

Code List 5 - Aggregation with Java Stream API
public Double getHighestScore() {
    Optional<Exam> maxScoreExam = getExams().stream()
                                    .max((e1, e2) -> e1.getScore().compareTo(e2.getScore())); 
    return maxScoreExam.isPresent()? highestScoreExam.get().getScore() : 0;
}

Then, a method showRank(List<Person>, double) can be written as Code List 6. The first parameter is the list of all examinees, and the second parameter is the score threshold required to show on the rank. The program first calls the stream() method to obtain the Stream object, and calls filter(Predicate) method of the Stream object to filter out the examinee whose score is under the threshold. Here, using the Lambda Expression to write the predicate function is intuitive and improves the readability. Call the sorted(Comparator) method to sort the examinees based on the score, and then the map(Function) method combining the examinee's fullname and score, e.g., "Spirit Tu: 840.0," as the result. Note that the element in the stream returned by the map(Function) method is not Person object anymore -- the element becomes a string object. Therefore, the Lambda Expression in forEach(Consumer), e represents a string and can be printed on the console directly. Finally, call showRank(persons, 700) to show the rank of examinees who ever got score more than 700 in one examination. The entire process of Code List 6 can illustrated as the pipeline in Figure 3.

Code List 6 - More interesting example of Stream API
public static void showRank(List<Person> persons, double threshold) {
    persons.stream()
        .filter(p -> p.getHighestScore() > threshold)
        .sorted((p1, p2) -> p1.getHighestScore().compareTo(p2.getHighestScore()))
        .map(p -> String.format("%s: %.1f", p.getFullName(), p.getHighestScore()))
        .forEach(s -> System.out.println(s));
}

Figure 3 - The pipeline of Code List 6

Honestly, I feel very kind of Java Stream API because I studied visual dataflow language many years in graduate school. In VisualTPL (my study), the concept of loop is implicit. What to do is more important that how to do. In the same way, Java Stream API internalizes the loop, the importance of an operation is to do what. Both improve the abstraction level and readability largely. However, the features provided by Java Stream API are more than that described in this article. The next article will describe other features.

← Java 8 初探 - Stream Java 8 初探 - Lazy Evaluation & Parallel Stream →