over 4 years ago

今天來點輕鬆的,這應該算是看完《超越Java -- 探討程式語言的未來》多年後最近再次思考的一些心得:近幾年Java太重視Library (J2EE已經變成龐然大物)了,忽略了語言本身的問題(J2SE 7的auto-closable resource in try-catch和J2SE 8的Lambda恐怕是繼J2SE 5的Generic/Autoboxing後針對語言本身所做的少數改變)。Java SE 8已經發布,甚至短短一個月不到(2014/3/18 -> 2014/4/15),就發布的Java SE 8 Update 5 (從Oracle接手後,版本號碼跳超快的),下一個版本Java SE 9似乎已經開始蠢蠢欲動,不過在接觸幾種不同語言後,其實有幾個語言特性(不一定是功能),我到蠻希望Java能加入的。

第一個我希望加入的是property,其實蠻多語言都有類似的語言特性,例如Objective C和C#都有,property可以像存取變數值那樣使用,例如像Code List 1中for(Publication* publication in person.publications)取得著作,或像Code List 2中的page.Height = 400指定頁面高度,但它和直接把成員變數宣告成public不同,因為property還是有存取權限的設定,由不同存取權限的getter和setter所組成的,所以在Code List 2中,改變高度是會通知handlers (observer pattern)的。只是多數語言有提供預設實作,或是提供機制自動產生,例如@synthesize firstName;。雖然Java可以使用Lombok自動生成getter/setter,但終究不是語言的一部分,而且我討厭用annotation做這件事(好吧!關於annotation純屬我個人喜好問題)。

Code List 1 - The property in Objective C
// In Person.h
@interface Person : NSObject

@property (nonatomic, readonly) NSInteger age;
@property (nonatomic, readonly) NSArray* publications;
@property (nonatomic) NSString* firstName;
@property (nonatomic) NSString* lastName;

@end

// In Person.m
@implementation Person {
    NSMutableArray* _publications;
}

@synthesize firstName;
@synthesize lastName;

- (NSInteger)age {
    NSDate* date = [NSDate date];
    NSCalendar* calendar = [NSCalendar currentCalendar];
    NSDateComponents* components = [calendar components:NSYearCalendarUnit fromDate:self.birthday toDate:now  options:0];
    return [ageComponents year];
}

- (NSArray*)publications {
    return _publications;
}
@end

// In somewhere alse
Person* person = [controller getPerson];
for(Publication* publication in person.publications) {
    // do something
}
Code List 2 - The property in C#
// In Page.cs
public class Page : Drawable {

    /// <summary>
 /// Gets or sets the height of the page
 /// </summary>
 public int Height {
        get { return _height; }
        set {
            _height = value;
            NotifyContentUpdatedEventHandlers();
        }
    }
}

// In somewhere alse
Page page = createEmptyPage();
page.Height = 400;

第二個特性是literal data structure declaration,JavaScript和Objective C等語言都有提供類似的特性。有時候在處理資料時,只是想要個簡單的資料物件(data object),但就是一定要寫個class,然後提供getter/setter的實作,這實在太麻煩了,以Code List 3為例,馬上就能組出一個company資料物件,沒錯,其實Code List 3是Objective C從JavaScript Object Notation (JSON)那裡學來的(Code List 4),只是要加上小老鼠@作為識別。Java也許該考慮一下!小抱怨一下,為什麼內建的JSON API只有在J2EE中才有,難道client不需要解析JSON嗎?真是奇怪的一件事。

Code List 3 - The literal data structure declaration
id company = @{
    @"name": @"Far Far AwayCompany",
    @"address": @"I don't want to know",
    @"foundedOn": @1999,
    @"employees": @[ @"Bill", @"Steve", @"John" ]
};
Code List 4 - JavaScript Object Notation
var company = {
    "name": "Far Far AwayCompany",
    "address": "I don't want to know",
    "foundedOn": 1999,
    "employees": [ "Bill", "Steve", "John" ]
};

第三個希望新增到Java中的特性是category,這是我在學Objective C後很喜歡的一個語言特性,category是一種可以用來黏合新實作到既有物件的方式,但又不是繼承,因為型別沒有改變,也不是C#的partial keyword。這在設計複雜的專案時很方便,可以依據功能分類到不同的檔案中,而且可以依據需求只使用部分的功能。例如:POJO (Plain Old Java Object)物件與JSON之間的轉換,到底應該算是物件本身自己該做的事?還是由別的物件負責做轉換?

我剛開始學OO語言時,會覺得是前者,但後來多看了很多設計後,我傾向後者,畢竟物件可輸出成多種格式(JSON或XML),想要新增一種輸出格式時,得修改原物件有點違反open close principle,而且在沒有原始碼的情況下,修改原物件也不可能,只能用繼承,但繼承又會改變型態。但如果有category機制,就可以將新格式輸出的實作黏合到既有物件中。以Code List 5為例,Person物件在原本的Person.h中只需要提供domain model所需的邏輯即可,而JSON相關的邏輯實作則是放在JSON的category中,若需要JSON相關的邏輯實作,才需要匯入(import) Person+JSON.h,若不需要,只需匯入一般的Person.h即可,這時JSON相關的實作也看不到。而且在沒有原始碼的情況下也可以用category來黏合新實作,我就常用category替NSString加新東西。

Code List 5 - Category
// In Person.h
@interface Person : NSObject

// domain model logic

@end

// In Person+JSON.h
@interface Person (JSON)

+ (instancetype)fromJson:(id)json;

- (void)updateWithJson:(id)json;

- (id)toJson;

@end

// In somewhere alse
#import "Person+JSON.h"

Person* person = [Person fromJson:json];

第四個特性是extension,這也是Objective C的一個語言特性。我希望被加入Java中和測試有關,Objective C雖然有 @public@protected@private關鍵字,但是用在成員變數上,而不是函式上。Objective C沒有要求函式一定要宣告在.h檔中,所以大多數情況下是把不希望被知道的函式直接寫在.m檔中,有時候為了測試方便會希望,某些原本宣告成private的函式能在測試期間暫時變成public,這時extension就派上用場了,以Code List 6為例,將希望公開的函式放在Person.h檔中,不希望公開但希望在測試期間被看到的private函式則放Person_Private.h中,公開程式時只公開Person.h檔,Person_PRivate.h檔則不公開,測試時匯入Person_Private.h檔就能看到想測的private函式(其實,Objective C的extension還可以狠,連extension的.h檔都不需要,直接在測試程式中,強制把private methods變成公開的XD)。雖然Java沒有.h標頭檔和.m實作檔分離的設計,但如果能有private interface implementation的話,也許可以提供類似的特性。

Code List 6 - Extension
// In Person.h
@interface Person : NSObject

// public domain model logic

@end

// In Person_Private.h
#import "Person.h"

@interface Person()

- (void)privateMethod;

@end

// In Person.m
#import "Person_Private.h"

@implementation Person

// Implement the methods that declared in Person.h and Person_Private.h

@end

// In PersonTests.m
#import "Person_Private.h"

@implementation PersonTests

- (void)testPrivateMethod {
    // Can see and test -(void)privateMethod method declared in Person_Private.h in test mode.
}

@end

第五個特性是與IDE整合用的preprocessor。雖然有不少人因為IDE越來越笨重開始回歸單純,使用一般(說一般也不對,因為還是有syntax highlighted等功能)的文字編輯器寫程式,但我想還是有不少人是用IDE在寫程式(我也一度覺得Eclipse超慢的,在想找替代品時發現4.3又變快了才繼續使用),Objective C和C#在與IDE整合上就表現得很不錯,Java有蠻多好的IDE,但語言對於IDE則完全沒有支援。像我就會依照功能整理程式碼,所以我會在Objective C裡用#param mark - UITableViewDelegate Methods將實作UITableViewDelegate的函式集中分區塊,或在C#裡用#region Properties#endregion將所有的properties包起來成為一個區塊,當IDE看到這些preprocessor時,可以做最佳化顯示,例如在XCode裡,如果有使用#param mark,在上方導覽列的函示列表(Figure 1)就會根據mark的名稱將函式分區顯示,在找函式時更方便。

Figure 1 - XCode provides integration with preprocessors

這幾年新出的語言都強打在少寫code和提高可讀性,更重要的是能更容易發展出domain specific language,就這一點Java確實有點顯得疲態了。其實文中列的特性大多是一些語法糖衣,但對程式的可讀性和抽象度都能提昇不少,我覺得挺實用也很划算的。

 
over 4 years ago

In addition to the pipe design, I am interested in other two features of Stream: lazy evaluation and parallel stream. Lazy evaluation can be considered as the computation is delayed util it is actually needed. And the parallel stream can execute the entire pipe in parallel. The most important thing is that these two features are optimized for JVM and should be more efficient than our implementation.

When I see the lazy evaluation, I think it could be benefit to loading large files, e.g., reducing the memory consumption. However, I am not sure for that. Therefore, I design an experiment to confirm my thought. First, an interface FileSearchStrategy is created (Code List 1) and its method can accept a file (folder), a keyword, and a result collector (SearchResultCollector). Each implementation can use different methodologies to search the keyword in a file and put the result, a tuple <filename, line number, line content>, into the collector.

Code List 1 - FileSearchStrategy Interface
package java8.stream;

import java.io.File;

public interface FileSearchStrategy {

    public void search(File root, String keyword, SearchResultCollector collector);
}

The File object in Java can be pointed to a file or a folder. Thus, AbstractSearchStrategy (Code List 2) provides an implementation for the method search(File, String, SearchResultCollector) of FileSearchStrategy to traverse each folder recursively. A hook method is declared for concrete classes to scan the content of a real file.

Code List 2 - AbstractSearchStrategy handles the directory traversal
package java8.stream;

import java.io.File;

public abstract class AbstractSearchStrategy implements FileSearchStrategy {

    @Override
    public void search(File root, String keyword, SearchResultCollector collector) {
        if(root.isDirectory()) {
            File[] files = root.listFiles();
            if(files != null) {
                for(File file : files) {
                    search(file, keyword, collector);
                }
            }
        }
        else {
            collector.increaseFileCount();
            scanKeyword(root, keyword, collector);
        }
    }

    protected abstract void scanKeyword(File file, String keyword, SearchResultCollector collector);
}

The infrastructure is completed. And the first strategy implementation DefaultSearchStrategy (Code List 3) uses the frequently-used algorithm on text files before having Stream API: scan line by line. The result of the strategy is also the baseline of the experiment.

Code List 3 - The tranditional text file reading
package java8.stream;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class DefaultSearchStrategy extends AbstractSearchStrategy {

    @Override
    protected void scanKeyword(File file, String keyword, SearchResultCollector collector) {
        String path = file.getName();
        try (FileReader fileReader = new FileReader(file);
            BufferedReader reader = new BufferedReader(fileReader)) {
            String line = null;
            long lineNumber = 1;
            while((line = reader.readLine()) != null) {
                if(line.contains(keyword)) {
                    collector.add(new KeywordSearchResult(path, line, lineNumber));
                }
                lineNumber++;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Then, the implementation of AllLinesSearchStrategy uses the readAllLines(Path) method of Files class which coming with the NIO 2 (New I/O 2) package in Java 7 (Code List 4). In fact, the description in Java Doc says that it is convenient to read all lines in a single operation and not intended for reading in large files. Therefore, the result of the strategy should be the worst case in the experiment.

Code List 4 - The method that uses Files.readAllLines(Path)
package java8.stream;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.List;

public class AllLinesSearchStrategy extends AbstractSearchStrategy {

    @Override
    protected void scanKeyword(File file, String keyword, SearchResultCollector collector) {
        String path = file.getName();
        try {
            List<String> lines = Files.readAllLines(file.toPath());
            int linesCount = lines.size();
            for(int index = 1; index < linesCount; index++) {
                String line = lines.get(index);
                if(line.contains(keyword)) {
                    collector.add(new KeywordSearchResult(path, line, index));
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

For convenient to the following parallel stream experiment, the implementation of StreamSearchStrategy puts the pipe combination into another method (Code List 5): scanStream(Stream, String, SearchResultCollector). And then uses the lines() method of the BufferedReader class to get the Stream as input. In order to obtain the line number, the intermediate operation in the pipe is map(Function) which transfers a string into an object consisting of a line number and a string (using KeywordSearchResult for simplification). And filter(Predicate) is used to filter out objects that do not match.

Code List 5 - The strategy that uses Stream to traverse directories and scan file
package java8.stream;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamSearchStrategy extends AbstractSearchStrategy
    implements Function<String, KeywordSearchResult> {

    protected long _lineCounting;
    protected String _scanningPath;

    @Override
    public void scanKeyword(File file, String keyword, SearchResultCollector collector) {
        _scanningPath = file.getName();
        try (FileReader fileReader = new FileReader(file);
            BufferedReader reader = new BufferedReader(fileReader)) {
            scanStream(reader.lines(), keyword, collector);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public void scanStream(Stream<String> stream, String keyword, SearchResultCollector collector) {
        _lineCounting = 1;
        collector.addAll(stream
                            .map(this)
                            .filter(r -> r.getKeywordAppearedLine().contains(keyword))
                            .collect(Collectors.toList()));
    }

    @Override
    public KeywordSearchResult apply(String line) {
        return new KeywordSearchResult(_scanningPath, line, _lineCounting++);
    }
}

Since the line number is required, the scanStream(Stream, String, SearchResultCollector) method of the StreamSearchStrategy class uses map(Function) before filter(Predicate). If the line number can be ignored, as shown in Code List 6, the StreamSearchStrategyV2 class uses filter(Predicate) before map(Function), does it affect the result?

Code List 6 - The strategy that ignores the line number
package java8.stream;

import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamSearchStrategyV2 extends StreamSearchStrategy {

    public void scanStream(Stream<String> stream, String keyword, SearchResultCollector collector) {
        collector.addAll(stream
                            .filter(s -> s.contains(keyword))
                            .map(s -> new KeywordSearchResult(_scanningPath, s, 0))
                            .collect(Collectors.toList()));
    }
}

The size of the system memory grows quickly, and the operating system usually caches the file content. Therefore, reading the same file again is speeded up. However, the acceleration will be an impact factor. To avoid reading the file content from the cache, the experiment prepares three folders, in each, consisting of the same file structure and content listed in Table 1. For example, the sub-folder A has 10 files, 100k records in each file (size is 3.62 MB), and sub-folders B, C, etc are the same. The sub-folder G cloned a copy of the sub-folders A ~ F, and as a result, each folder has 120 files, totally sized 4.45 GB.

Table 1 - Test data

Sub Folder Records File Amount Size/File (MB)
A 100k 10 3.62
B 200k 10 7.24
C 400k 10 14.4
D 800k 10 28.9
E 1600k 10 57.9
F 3200k 10 115
G A + B + C + D + E + F

However, the success of the method depends on the size of the system memory. The experiment environment is listed in Table 2, and the test process is shown in Code List 7. Each strategy searches the keyword in the three prepared folders in order. When the second strategy is started to search the first folder, the content of the other two folders should be loaded into memory by the first strategy. The total size is 8.9 GB that is larger than the size of the system memory. Therefore, the content of the first folder is not in the cache. A MemoryUsageMonitor object monitors the memory usage of JVM periodically and records the peak value in the experiment,

Table 2 - Test environment

Hardware Specification
CPU Intel Core i5-2400
Memory 8GB
HDD Seagate ST3160815AS 160GB
OS Windows 7 SP1
JVM 1.8.0-b132
Code List 7 - Test jobs schedule
public void runJobs(File[] jobs, String keyword) {
    Set<String> keys = _strategies.keySet();
    System.out.println("Strategy, Time (ms), Files, Folder, Found, Memory");
    for(String key : keys) {
        FileSearchStrategy strategy = _strategies.get(key);
        for(File job : jobs) {
            runJob(job, strategy, keyword);
        }
    }
}

private void runJob(File job, FileSearchStrategy strategy, String keyword) {
    _results.clear();
    _largestMemoryUsage = 0;
    long startTime = System.currentTimeMillis();
    MemoryUsageMonitor.getInstance().startMonitor();
    strategy.search(job, keyword, _results);
    long time = System.currentTimeMillis() - startTime;
    MemoryUsageMonitor.getInstance().stopMonitor();
    int matchCount = _results.getResults().size();
    long filesCount = _results.getProcessedFileCount();
    String jobName = job.getName();
    String strategyName = strategy.getClass().getSimpleName();
    String result = String.format("%s, %d, %d, %s, %d, %d", strategyName, time, filesCount, jobName, matchCount, _largestMemoryUsage);
    System.out.println(result);
    System.gc();
}

Well, it is time to show the result! The result is shown in Table 3. As expected, All Lines strategy uses the most memory (over 1GB) and took the longest time (20 seconds longer than other strategies). However, the differences between Default, Stream, and Stream v2 are not significant (about 3 seconds). The memory usage of the Stream v2 and Default strategies are almost the same, but the map(Function) seems to be a bad cost.

Table 3 - Test result of four strategies

Strategy Time (ms) Memory (MB)
Default 106788 42.8
Stream 109272 57.4
Stream v2 109402 41.7
All Lines 128749 1140.9

The AbstractSearchStrategy uses the tranditional foreach to traverse each file. Does the parallel stream benefit? or make worse? To understand that, the implementation of the search(File, String, SearchResultCollector) method of the AbstractSearchStrategy class is changed to Code List 8, and run the experiment again.

Code List 8 - The directories traversal with Stream API
@Override
public void search(File root, String keyword, SearchResultCollector collector) {
    if(root.isDirectory()) {
        try {
            Files.list(root.toPath())
                    .parallel()
                    .forEach(p -> search(p.toFile(), keyword, collector));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    else {
        collector.increaseFileCount();
        scanKeyword(root, keyword, collector);
    }
}

Suppose the parallel stream can speed up the search. However, from the result of Table 4, the parallel stream does not speed up the search, and makes wrose: huge memory consumption. I am surprised at that the memory usage of the Stream v2 strategy is more than that of the Stream strategy, and I don't known how to expain the phenomenon?

Table 4 - Test result of four strategies with parallel directories traversal

Strategy Time (ms) Memory (MB)
Default 109386 281.7
Stream 109230 397.1
Stream v2 108978 487.5
All Lines 122702 1441.9

From the experiment result, when I/O is involved, even using parallel stream, a pipe with Stream does not speed up or use less memory. Sometimes, it is slower. If the data is not in the files on hard disk, how about the effect on processing data in the memory? Therefore, the third experiment is designed. Suppose that 100k ~ 6400k records are kept in an ArrayList, and based on the parameter (Code List 9), use the parallelStream() or stream() method to obtain the input of scanStream(Stream, String, SearchResultCollector).

Code List 9 - Testing the parallel stream
public void runTests(int times, String keyword) {
    _testTimes = times;
    StreamSearchStrategy strategy = new StreamSearchStrategy();
    StreamSearchStrategyV2 strategy2 = new StreamSearchStrategyV2();
    System.out.println("Strategy, Time, Parallel, Collection Size, Match, Memory");
    for(int index = 0; index < _testTimes; index++) {
        Collection<String> collection = createCollection(_collectionSize);
        runTest(new ArrayList<String>(collection), false, strategy, keyword);
        runTest(new ArrayList<String>(collection), true, strategy, keyword);
        runTest(new ArrayList<String>(collection), false, strategy2, keyword);
        runTest(new ArrayList<String>(collection), true, strategy2, keyword);
        _collectionSize = _collectionSize * 2;
    }
}
private void runTest(Collection<String> collection, boolean useParallel, StreamSearchStrategy strategy, String keyword) {
    SearchResultTableModel results = new SearchResultTableModel();
    Stream<String> stream = useParallel? collection.parallelStream() : collection.stream();
    long startTime = System.currentTimeMillis();
    strategy.scanStream(stream, keyword, results);
    long time = System.currentTimeMillis() - startTime;
    int collectionSize = collection.size();
    String strategyName = strategy.getClass().getSimpleName();
    String result = String.format("%s, %d, %s, %d", strategyName, time, String.valueOf(useParallel), collectionSize);
    System.out.println(result);
}

The result is listed in Table 5. Note that, in the 100k column, no matter run Stream strategy first or run Stream v2 strategy first, the first run strategy always get a bad result. The cause may be the cold start up of the program. Thus, the 100k column is ignored. Starting from 200k, both the Stream and Stream v2 strategies can be beneifted by parallelStream() to reduce the execution time a lost with. In the 6400k column, Stream v2 with parallel stream can save 136 ms. In general, the performance of the Stream v2 strategy is better than that of the Stream strategy.

Table 5 - The execution time (ms) with parallel stream

Strategy Parallel 100k* 200k 400k 800k 1600k 3200k 6400k
Stream Close 47 8 15 30 57 187 342
Stream Open 29 5 11 21 42 83 165
Stream v2 Close 12 7 12 24 47 93 192
Stream v2 Open 2 3 5 9 15 29 56

It is time to give a conclusion. First, to be benefited by the lazy evaluation, the optimization of the pipe design is required. Bad pipe design makes the performance worse. And I/O can not get lots of benefit from the lazy evaluation. The parallel stream can speed up the processing on some kinds of data sources. The I/O data source or the data source that to access may have race condition will not speed up by the parallel stream. If the data is in memory already or the data source that to access without lock, parallel stream can speed up much. However, the parallel stream also increases the memory usage, so the parallel stream should be used carefully.

ps. The source code is still under organization. When the organization is completed, the source code will be opened on GitHub.

 
over 4 years ago

除了Pipe的設計外,Stream另外讓我好奇的二個特色:lazy evaluationparallel stream。lazy evaluation可以想成是延後運算到真正必要的時候,而parallel stream則是將Pipe以平行運算的方式進行,最重要的是這二個特色都是針對JVM最佳化過的,應該比我們自己寫來的更有效率。

當看到lazy evaluation我最先想到的是,應該對載入大型檔案有幫助,例如:減少記憶體使用量,但我沒把握這想法是否正確,所以設計了一個實驗試試看我的想法。首先,設計一個FileSearchStrategy介面(Code List 1),可以輸入檔案(目錄)、關鍵字和結果收集器(SearchResultCollector),每個實作可以用不同的方式從檔案中搜尋關鍵字,並將結果<檔案名稱、行數、該行內容>存放到收集器中。

Code List 1 - FileSearchStrategy Interface
package java8.stream;

import java.io.File;

public interface FileSearchStrategy {

    public void search(File root, String keyword, SearchResultCollector collector);
}

由於Java的File可能指向一個檔案或目錄,因此AbstractSearchStrategy (Code List 2)實作FileSearchStrategysearch(File, String, SearchResultCollector),以遞迴的方式走訪每一層目錄,並留一個hook method讓繼承者提供實際掃描檔案的實作。

Code List 2 - AbstractSearchStrategy handles the directory traversal
package java8.stream;

import java.io.File;

public abstract class AbstractSearchStrategy implements FileSearchStrategy {

    @Override
    public void search(File root, String keyword, SearchResultCollector collector) {
        if(root.isDirectory()) {
            File[] files = root.listFiles();
            if(files != null) {
                for(File file : files) {
                    search(file, keyword, collector);
                }
            }
        }
        else {
            collector.increaseFileCount();
            scanKeyword(root, keyword, collector);
        }
    }

    protected abstract void scanKeyword(File file, String keyword, SearchResultCollector collector);
}

基礎架構完成後,第一個預設實作DefaultSearchStrategy (Code List 3)是在還沒有Stream API之前,針對文字檔案常用的演算法:逐行掃描。這個實作的結果即是實驗的基準值。

Code List 3 - The tranditional text file reading
package java8.stream;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public class DefaultSearchStrategy extends AbstractSearchStrategy {

    @Override
    protected void scanKeyword(File file, String keyword, SearchResultCollector collector) {
        String path = file.getName();
        try (FileReader fileReader = new FileReader(file);
            BufferedReader reader = new BufferedReader(fileReader)) {
            String line = null;
            long lineNumber = 1;
            while((line = reader.readLine()) != null) {
                if(line.contains(keyword)) {
                    collector.add(new KeywordSearchResult(path, line, lineNumber));
                }
                lineNumber++;
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

接著AllLinesSearchStrategy的實作(Code List 4),使用在Java 7推出的NIO 2 (New I/O 2)套件所提供的FilesreadAllLines(Path)函式,事實上,在Java Doc的說明中,這個函式只適合用在簡單的案例,並不適合用在大檔案上,因此,這個實作應該會得到實驗中最差的結果。

Code List 4 - The method that uses Files.readAllLines(Path)
package java8.stream;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.List;

public class AllLinesSearchStrategy extends AbstractSearchStrategy {

    @Override
    protected void scanKeyword(File file, String keyword, SearchResultCollector collector) {
        String path = file.getName();
        try {
            List<String> lines = Files.readAllLines(file.toPath());
            int linesCount = lines.size();
            for(int index = 1; index < linesCount; index++) {
                String line = lines.get(index);
                if(line.contains(keyword)) {
                    collector.add(new KeywordSearchResult(path, line, index));
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

為了方便後續平行處理的實驗,StreamSearchStrategy的實作(Code List 5),將實際Pipe的運算組成放到另一個函式:scanStream(Stream, String, SearchResultCollector)中,然後使用BufferedReaderlines()函式取得Stream物件進行運算。為了取得行號,Pipe的第一個intermediate operation使用map(Function),將字串轉成同時帶有行號與字串內容的物件(使用KeywordSearchResult只是簡化實作),然後再用filter(Predicate)過濾掉不要的物件。

Code List 5 - The strategy that uses Stream to traverse directories and scan file
package java8.stream;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Files;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamSearchStrategy extends AbstractSearchStrategy
    implements Function<String, KeywordSearchResult> {

    protected long _lineCounting;
    protected String _scanningPath;

    @Override
    public void scanKeyword(File file, String keyword, SearchResultCollector collector) {
        _scanningPath = file.getName();
        try (FileReader fileReader = new FileReader(file);
            BufferedReader reader = new BufferedReader(fileReader)) {
            scanStream(reader.lines(), keyword, collector);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public void scanStream(Stream<String> stream, String keyword, SearchResultCollector collector) {
        _lineCounting = 1;
        collector.addAll(stream
                            .map(this)
                            .filter(r -> r.getKeywordAppearedLine().contains(keyword))
                            .collect(Collectors.toList()));
    }

    @Override
    public KeywordSearchResult apply(String line) {
        return new KeywordSearchResult(_scanningPath, line, _lineCounting++);
    }
}

為了取得行號,所以StreamSearchStrategyscanStream(Stream, String, SearchResultCollector)中Pipe組成是先用map(Function)再用filter(Predicate)。若忽略行號,改成Code List 6的StreamSearchStrategyV2,先使用filter(Predicate)再使用map(Function),對結果會有影響嗎?

Code List 6 - The strategy that ignores the line number
package java8.stream;

import java.util.stream.Collectors;
import java.util.stream.Stream;

public class StreamSearchStrategyV2 extends StreamSearchStrategy {

    public void scanStream(Stream<String> stream, String keyword, SearchResultCollector collector) {
        collector.addAll(stream
                            .filter(s -> s.contains(keyword))
                            .map(s -> new KeywordSearchResult(_scanningPath, s, 0))
                            .collect(Collectors.toList()));
    }
}

目前系統記憶體越來越大,作業系統常會將檔案內容快取在系統記憶體中,當第二次讀取相同檔案時,速度可以加速許多,但對實驗來說,這會是個影響數據的關鍵,為了避免讀到快取的檔案內容,實驗準備了三個資料夾,每個資料夾放入Table 1所述相同的檔案結構,以A子資料夾為例,一個檔案有100k行(筆)資料,大小約3.62 MB,這樣的檔案有10個檔案,B、C等子資料夾依此類推,G子資料夾則是將A~F子資料夾複製一份放入,120個檔案合計4.45 GB。

Table 1 - Test data

Sub Folder Records File Amount Size/File (MB)
A 100k 10 3.62
B 200k 10 7.24
C 400k 10 14.4
D 800k 10 28.9
E 1600k 10 57.9
F 3200k 10 115
G A + B + C + D + E + F

當然,這是否可行要看系統記憶體的多寡,實驗的環境如Table 2所列,而測試方法如Code List 7,每種strategy依序掃描三個資料夾,當第二個strategy開始掃描第一個資料夾時,因為其他兩個資料夾在執行前一個strategy載入,總量有8.9 GB,已超過系統記憶體,第一個資料夾內的內容應該已不在快取中。實驗使用一個MemoryUsageMonitor的物件,定期監控JVM的記憶體使用量,並記錄每次掃描的峰值。

Table 2 - Test environment

Hardware Specification
CPU Intel Core i5-2400
Memory 8GB
HDD Seagate ST3160815AS 160GB
OS Windows 7 SP1
JVM 1.8.0-b132
Code List 7 - Test jobs schedule
public void runJobs(File[] jobs, String keyword) {
    Set<String> keys = _strategies.keySet();
    System.out.println("Strategy, Time (ms), Files, Folder, Found, Memory");
    for(String key : keys) {
        FileSearchStrategy strategy = _strategies.get(key);
        for(File job : jobs) {
            runJob(job, strategy, keyword);
        }
    }
}

private void runJob(File job, FileSearchStrategy strategy, String keyword) {
    _results.clear();
    _largestMemoryUsage = 0;
    long startTime = System.currentTimeMillis();
    MemoryUsageMonitor.getInstance().startMonitor();
    strategy.search(job, keyword, _results);
    long time = System.currentTimeMillis() - startTime;
    MemoryUsageMonitor.getInstance().stopMonitor();
    int matchCount = _results.getResults().size();
    long filesCount = _results.getProcessedFileCount();
    String jobName = job.getName();
    String strategyName = strategy.getClass().getSimpleName();
    String result = String.format("%s, %d, %d, %s, %d, %d", strategyName, time, filesCount, jobName, matchCount, _largestMemoryUsage);
    System.out.println(result);
    System.gc();
}

好啦!該是公布測試結果(Table 3)的時候了,All Lines果然如預期般使用最多的記憶體(超過1GB),所花費的時間也是最長的,多了20秒左右,但Default、Stream和Stream v2之間的差異不大,就執行時間上,三者的差距大約3秒,而記憶體的使用量Stream v2和Default幾乎是一樣,但Stream一開始的map(Function)似乎是致命傷。

Table 3 - Test result of four strategies

Strategy Time (ms) Memory (MB)
Default 106788 42.8
Stream 109272 57.4
Stream v2 109402 41.7
All Lines 128749 1140.9

先前AbstractSearchStrategy使用的是傳統foreach方式走訪所有的檔案,那如果使用Stream的parallel會有幫助嗎?還是更糟?所以,將AbstractSearchStrategysearch(File, String, SearchResultCollector)改成Code List 8後,再次執行測試。

Code List 8 - The directories traversal with Stream API
@Override
public void search(File root, String keyword, SearchResultCollector collector) {
    if(root.isDirectory()) {
        try {
            Files.list(root.toPath())
                    .parallel()
                    .forEach(p -> search(p.toFile(), keyword, collector));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    else {
        collector.increaseFileCount();
        scanKeyword(root, keyword, collector);
    }
}

原先期望透過parallel stream的方式加快執行速度,但從Table 4看來,幾乎沒有加速,反而還帶來了反效果:記憶體使用量暴增。令人意外的是Stream v2的記憶體使用量比Stream還多,不知該怎麼解釋。

Table 4 - Test result of four strategies with parallel directories traversal

Strategy Time (ms) Memory (MB)
Default 109386 281.7
Stream 109230 397.1
Stream v2 108978 487.5
All Lines 122702 1441.9

似乎只要和I/O扯上關係,即使用平行運算的方式也沒有減少太多的執行時間,有時候反而還更慢,那如果資料不在硬碟的檔案裡,都已經在記憶體中,那效果會如何?因此,設計了第三個實驗:100k ~ 6400k筆資料存放在ArrayList中,然後根據參數使用parallelStream()stream()作為scanStream(Stream, String, SearchResultCollector)的輸入(Code List 9)。

Code List 9 - Testing the parallel stream
public void runTests(int times, String keyword) {
    _testTimes = times;
    StreamSearchStrategy strategy = new StreamSearchStrategy();
    StreamSearchStrategyV2 strategy2 = new StreamSearchStrategyV2();
    System.out.println("Strategy, Time, Parallel, Collection Size, Match, Memory");
    for(int index = 0; index < _testTimes; index++) {
        Collection<String> collection = createCollection(_collectionSize);
        runTest(new ArrayList<String>(collection), false, strategy, keyword);
        runTest(new ArrayList<String>(collection), true, strategy, keyword);
        runTest(new ArrayList<String>(collection), false, strategy2, keyword);
        runTest(new ArrayList<String>(collection), true, strategy2, keyword);
        _collectionSize = _collectionSize * 2;
    }
}
private void runTest(Collection<String> collection, boolean useParallel, StreamSearchStrategy strategy, String keyword) {
    SearchResultTableModel results = new SearchResultTableModel();
    Stream<String> stream = useParallel? collection.parallelStream() : collection.stream();
    long startTime = System.currentTimeMillis();
    strategy.scanStream(stream, keyword, results);
    long time = System.currentTimeMillis() - startTime;
    int collectionSize = collection.size();
    String strategyName = strategy.getClass().getSimpleName();
    String result = String.format("%s, %d, %s, %d", strategyName, time, String.valueOf(useParallel), collectionSize);
    System.out.println(result);
}

實驗結果列於Table 5,100k一欄,不論是Stream或Stream v2哪個先執行都會得到不理想的數據,可能是程式冷啟動所引起,所以忽略100k欄的數值。從200k開始,不論使用Stream或Stream v2,都可看到當使用parallelStream()時,執行時間有明顯的減少,在Stream v2的6400k一欄,節省了136 ms。而且整體來說,Stream v2也明顯比Stream要好。

Table 5 - The execution time (ms) with parallel stream

Strategy Parallel 100k* 200k 400k 800k 1600k 3200k 6400k
Stream Close 47 8 15 30 57 187 342
Stream Open 29 5 11 21 42 83 165
Stream v2 Close 12 7 12 24 47 93 192
Stream v2 Open 2 3 5 9 15 29 56

該是結論的時候了,首先,lazy evaluation的效益必須是在pipe的組合上有最佳化過的,若組合的不好反而更糟糕,且在I/O上幫助似乎也不大。parallel stream要能發揮效果必須看資料的來源類型,I/O類型或是存取上有競爭現象的資料較難發揮出效益,但若是在記憶體當中的資料,彼此無存取競爭(不用使用lock)的現象,那parallel stream的效果就相當明顯,不過要注意的是parallel stream也會使記憶體的使用量增加,使用上也要小心。

ps. 測試程式碼還在整理中,若整理完會公開到GitHub上。

 
over 4 years ago

Java Collection framework is a well-designed framework even without the Stream API. However, sometimes, writing a simple function, for example, finding an object in a collection based on some conditions, needs a for loop. Writing such a program is easy, but writing similar for-loop many times is boring. Before having the "for in" syntax surgar, the index calculation in a for loop is annoying. In addition, if the indices naming is not appropriate, debuging in a nested for loops is a terrible job. After using Apache Commons Collections, Apache Commons Collections has become the necessary library in my every project. Here is a simple example, to find an Person object in a List whose first name or last name matches the given value, the traditional way is like Code List 1 -- writing a for loop to check every person's first name and last name.

Code List 1 - Find an object in a list by using for loop
public Person findFirstPersonByForLoop(List<Person> persons, String firstOrLastName) {
    for (Person person : persons) {
        if (person.getFirstName().equalsIgnoreCase(firstOrLastName) ||
            person.getLastName().equalsIgnoreCase(firstOrLastName)) {
            return person;
        }
    }
    return null;
}

How about using Apache Commons Collections? The program is lised in Code List 2.a, and basically, a for-loop is not needed. Just call the find method with an object that implements the Predicate interface. Code List 2.b shows the implementation of the object. Only the evaluate method is required to implement -- return true if the given object matches the condition. That's all. What!? The lines of code become more than that of Code List 1. Yes. It does, but the PersonNamePredicate is reusable and easy to test. In the CollectionUtils class, there are 12 methods that use Predicate object to filter, select, or count objects in a collection. Therefore, I think it is worth writing the class.

Code List 2.a - Find an object in a list by using Apache Commons Collections
public Person findFirstPersonByCommonsCollections(List<Person> persons, String firstOrLastName) {
    return CollectionUtils.find(persons, new PersonNamePredicate(firstOrLastName));
}
Code List 2.b - The implementation of PersonNamePredicate
public class PersonNamePredicate implements Predicate<Person> {

    private String _searchCondition;

    public PersonNamePredicate(String condition) {
        _searchCondition = condition;
    }

    @Override
    public boolean evaluate(Person person) {
        return  person.getFirstName().equalsIgnoreCase(_searchCondition) ||
                person.getLastName().equalsIgnoreCase(_searchCondition);
    }
}

Well, the topic of this article is the new Stream API in Java 8. So how to find an object with the Stream API? The answer is shown in Code List 3. The lines of code are not reduced much. However, in comparison with Code List 1, Code List 3 can be interpreted as filter the objects out based on a condition and return the first one if it exists; otherwise return null and the detail of the loop is ignored. So in semantic or readability, does this way improve the level of abstraction?

Code List 3 - Find an object in a list by using Java Stream API
public Person findFirstPersonByStream(List<Person> persons, String firstOrLastName) {
    Optional<Person> result = persons.stream()
            .filter(p -> p.getFirstName().equalsIgnoreCase(firstOrLastName) ||
                    p.getLastName().equalsIgnoreCase(firstOrLastName))
            .findFirst();
    return result.isPresent()? result.get() : null;
}

Is it possible to find an object like Code List 2.a, but with Stream API? Yes, it is possible. First, write a helper class StreamUtils like Code List 4.a which provides a method find(Collection, Predicate). Second, revise the PersonNamePredicate as Code List 4.b, and then use just one line of code to find an object like Code List 4.c. Of course, if you do not want PersonNamePredicate to support both Apache Commons Collections and Java Stream API, the test method of java.util.Predicate is the only method required to implement. What is the advantage to write so many codes? Besides using parallelStream() as Code List 4.a may bring the advantage of parallel processing, this way does not bring much advantages. This reason is that the application (finding an object) is very simple, and using the Stream API is overkill.

Code List 4.a - The find method of StreamUtils
public static <T> T find(Collection<T> container, Predicate<T> predicate) {
    Optional<T> result = container.parallelStream().filter(predicate).findFirst();
    return result.isPresent()? result.get() : null;
}
Code List 4.b - The revised PersonNamePredicate
public class PersonNamePredicate implements org.apache.commons.collections4.Predicate<Person>,
    java.util.function.Predicate<Person> {

    private String _searchCondition;

    public PersonNamePredicate(String condition) {
        _searchCondition = condition;
    }

    @Override
    public boolean evaluate(Person person) {
        return  person.getFirstName().equalsIgnoreCase(_searchCondition) ||
                person.getLastName().equalsIgnoreCase(_searchCondition);
    }

    @Override
    public boolean test(Person person) {
        return evaluate(person);
    }
}
Code List 4.c - Find an object in a list by using customized StreamUtils
public Person findFirstPersonByStreamUtils(List<Person> persons, String firstOrLastName) {
    return StreamUtils.find(persons, new PersonNamePredicate(firstOrLastName));
}

The concept of Java Stream API is similar to the concept of Unix Pipeline or pipes and filters design pattern -- concatenating several simple operations to complete a meaning job. Since the operation is very simple, usually, using Lambda expression is concise and can improve the readability. As shown in Figure 1, Java Stream can concatenate serveral intermediate operations, and in the end, only one terminal operation as a pipeline. The intermediate operation is used to transforma the content of the stream, e.g., filtering (filter(Predicate)), mapping (map(Function), sorting (sorted(Comparator)), etc. And the terminal operation is used to produce the final result from or perform side effect on the content of the stream, e.g., collecting (collect(Collector)), applying something for each (forEach(Consumer)), or reducing (Reduce(BinaryOperator)), etc.

Figure 1 - Stream Pipeline

For example, a pipeline like Figure 2 can be used to summarize the assets of the rich persons who have assets of over 1 billion dollars. First, the filter(Predicate) filters out the persons who have assets of over 1 billion dollars. Then, the map(Function) extracts the value part of the assets. Finally, the reduce(BinaryOperation) aggregates values as the result. In fact, these similar operations are frequently used. Therefore, in Java Stream API, the Collectors class provides frequently-used terminal operations, e.g., summarizingDouble(ToDoubleFunction) combining a map(Function) intermediate operation and a terminal operation reduce(BinaryOperation) with the default implementation to simplify the composition of a pipeline.

Figure 2 - Stream Pipeline Example

The example is not concrete enough? One more concrete example. Assume that Exam represents a kind of examination, and a person can take an examination many times. Thus, in Person, a List is used to keep all examinations taken by the examinee. How to get the rank of the examinees whose score was more than 700 in any taken examination? To eliminate duplicated code, the getHighestScore() method like Code List 5 is added into Person to get the highest score in the taken examination (using the Stream API, too).

Code List 5 - Aggregation with Java Stream API
public Double getHighestScore() {
    Optional<Exam> maxScoreExam = getExams().stream()
                                    .max((e1, e2) -> e1.getScore().compareTo(e2.getScore())); 
    return maxScoreExam.isPresent()? highestScoreExam.get().getScore() : 0;
}

Then, a method showRank(List<Person>, double) can be written as Code List 6. The first parameter is the list of all examinees, and the second parameter is the score threshold required to show on the rank. The program first calls the stream() method to obtain the Stream object, and calls filter(Predicate) method of the Stream object to filter out the examinee whose score is under the threshold. Here, using the Lambda Expression to write the predicate function is intuitive and improves the readability. Call the sorted(Comparator) method to sort the examinees based on the score, and then the map(Function) method combining the examinee's fullname and score, e.g., "Spirit Tu: 840.0," as the result. Note that the element in the stream returned by the map(Function) method is not Person object anymore -- the element becomes a string object. Therefore, the Lambda Expression in forEach(Consumer), e represents a string and can be printed on the console directly. Finally, call showRank(persons, 700) to show the rank of examinees who ever got score more than 700 in one examination. The entire process of Code List 6 can illustrated as the pipeline in Figure 3.

Code List 6 - More interesting example of Stream API
public static void showRank(List<Person> persons, double threshold) {
    persons.stream()
        .filter(p -> p.getHighestScore() > threshold)
        .sorted((p1, p2) -> p1.getHighestScore().compareTo(p2.getHighestScore()))
        .map(p -> String.format("%s: %.1f", p.getFullName(), p.getHighestScore()))
        .forEach(s -> System.out.println(s));
}

Figure 3 - The pipeline of Code List 6

Honestly, I feel very kind of Java Stream API because I studied visual dataflow language many years in graduate school. In VisualTPL (my study), the concept of loop is implicit. What to do is more important that how to do. In the same way, Java Stream API internalizes the loop, the importance of an operation is to do what. Both improve the abstraction level and readability largely. However, the features provided by Java Stream API are more than that described in this article. The next article will describe other features.

 
over 4 years ago

即使沒有Stream,Java Collection framework的設計仍是相當不錯,只是有時候需要一些簡單的功能,例如:根據某些條件查找容器中的某個物件,總是要寫個for迴圈,程式不難但寫久了也覺得煩。在沒有類似for in的語法糖衣之前,index的管理很惱人。若是巢狀迴圈,index命名不好遇到問題debug起來更是頭痛,用過Apache Commons Collections後,Apache Commons Collections幾乎是專案裡必備的套件。先來個簡單例子,假設想在放Person物件的List中找姓名含某個特定值時,傳統的寫法如Code List 1,會是寫一個for迴圈,然後逐一檢查每個person物件的姓和名。

Code List 1 - Find an object in a list by using for loop
public Person findFirstPersonByForLoop(List<Person> persons, String firstOrLastName) {
    for (Person person : persons) {
        if (person.getFirstName().equalsIgnoreCase(firstOrLastName) ||
            person.getLastName().equalsIgnoreCase(firstOrLastName)) {
            return person;
        }
    }
    return null;
}

那如果用Apache Commons Collections又會如何呢?請看Code List 2.a,基本上不需要寫for迴圈,只要在呼叫find時傳一個實作Predicate介面的物件即可,該物件的實作在Code List 2.b,只需要實作evaluate這個method,判斷是否滿足條件,滿足回傳true,就這樣。什麼!程式碼行數變多,沒錯,確實變多,但PersonNamePredicate這物件是可以重複使用的,若觀察CollectionUtils這個class就會發現有12個methods利用Predicate物件對容器進行過濾、選擇、計數等操作,加上Predicate的實作要測試很容易,所以這樣寫很划算。

Code List 2.a - Find an object in a list by using Apache Commons Collections
public Person findFirstPersonByCommonsCollections(List<Person> persons, String firstOrLastName) {
    return CollectionUtils.find(persons, new PersonNamePredicate(firstOrLastName));
}
Code List 2.b - The implementation of PersonNamePredicate
public class PersonNamePredicate implements Predicate<Person> {

    private String _searchCondition;

    public PersonNamePredicate(String condition) {
        _searchCondition = condition;
    }

    @Override
    public boolean evaluate(Person person) {
        return  person.getFirstName().equalsIgnoreCase(_searchCondition) ||
                person.getLastName().equalsIgnoreCase(_searchCondition);
    }
}

好啦!既然主題是Java 8新的Stream API,那用Stream該怎麼寫?Stream的寫法會像Code List 3,好像沒有比較省行數,但和原始的Code List 1相比,就語意上或是可讀性上,Code List 3可以解讀成『根據一某條件過濾,然後找第一個,如果結果存在就回傳該物件,不存在就回傳null』,迴圈的操過被忽略了,是否有感覺抽象程度被提高了呢?

Code List 3 - Find an object in a list by using Java Stream API
public Person findFirstPersonByStream(List<Person> persons, String firstOrLastName) {
    Optional<Person> result = persons.stream()
            .filter(p -> p.getFirstName().equalsIgnoreCase(firstOrLastName) ||
                    p.getLastName().equalsIgnoreCase(firstOrLastName))
            .findFirst();
    return result.isPresent()? result.get() : null;
}

那如果要像Code List 2.a那樣,可以辦到嗎?可以!首先,像Code List 4.a,寫個StreamUtils輔助類別,提供一個find(Collection, Predicate)函式,然後改寫PersonNamePredicate如Code List 4.b,接著就可以像Code List 4.c那樣只寫一行就搞定。當然,如果不打算讓PersonNamePredicate同時支援Apache Commons Collections及Java Stream,只需實作java.util.Predicate介面的test函式就好。不過繞了一大圈,為的是什麼?除了Code List 4.a使用parallelStream()可能帶來平行處理的好處外,這個版本並沒有帶來太多的好處,主要是應用(find)太簡單了,用Stream有點殺雞用牛刀的感覺。

Code List 4.a - The find method of StreamUtils
public static <T> T find(Collection<T> container, Predicate<T> predicate) {
    Optional<T> result = container.parallelStream().filter(predicate).findFirst();
    return result.isPresent()? result.get() : null;
}
Code List 4.b - The revised PersonNamePredicate
public class PersonNamePredicate implements org.apache.commons.collections4.Predicate<Person>,
    java.util.function.Predicate<Person> {

    private String _searchCondition;

    public PersonNamePredicate(String condition) {
        _searchCondition = condition;
    }

    @Override
    public boolean evaluate(Person person) {
        return  person.getFirstName().equalsIgnoreCase(_searchCondition) ||
                person.getLastName().equalsIgnoreCase(_searchCondition);
    }

    @Override
    public boolean test(Person person) {
        return evaluate(person);
    }
}
Code List 4.c - Find an object in a list by using customized StreamUtils
public Person findFirstPersonByStreamUtils(List<Person> persons, String firstOrLastName) {
    return StreamUtils.find(persons, new PersonNamePredicate(firstOrLastName));
}

Java Stream API的概念類似Unix Pipelinepipes and filters design pattern,透過串接多個簡單的operation完成有意義的工作,由於operation通常都很簡單,所以使用Lambda expression多數時候可以帶來簡潔和提升可讀性的好處。如Figure 1所示,Java Stream能串多個intermediate operations,但最後只能串一個terminal operation來組成pipeline。用intermediate operation轉換stream內容,例如:過濾(filter(Predicate))、替換(map(Function)、排序(sorted(Comparator))等,然後用terminal operation對stream內的資料計算最終結果或產生side effect,例如:收集(collect(Collector))、逐一改變(forEach(Consumer))或歸納(Reduce(BinaryOperator))等。

Figure 1 - Stream Pipeline

例如,可以用Figure 2的pipeline來計算資料中資產超過10億元的富豪,其資產的總合,首先filter(Predicate)過濾出資產超過10億元的資料,接著用map(Function)取出資產的部分,最後用reduce(BinaryOperation)做歸納。事實上,類似的運算實在太常用了,因此Java Stream API中有個Collectors類別提供常用的terminal operation,例如summarizingDouble(ToDoubleFunction)就結合了map(Function)和預設的reduce(BinaryOperation)實作,簡化pipeline的組成。

Figure 2 - Stream Pipeline Example

覺得例子有點抽象?那再來一個更具體的例子吧。假設Exam代表一種測驗,每個人可以參加多次測驗,因此Person有一個List放存受測者參加過的所有測驗。假如想要取得曾經得超過700分的所有受測者排名,為了不寫重複的程式碼,如Code List 5,先將取得受測者曾經參加過的測驗最高分寫成PersongetHighestScore()函式(仍用Stream API)。

Code List 5 - Aggregation with Java Stream API
public Double getHighestScore() {
    Optional<Exam> maxScoreExam = getExams().stream()
                                    .max((e1, e2) -> e1.getScore().compareTo(e2.getScore())); 
    return maxScoreExam.isPresent()? highestScoreExam.get().getScore() : 0;
}

接著就可以寫一個如Code List 6的showRank(List<Person>, double)的函式,第一個參數是所有受測者,第二個參數是排行榜顯示的門檻。程式首先呼叫stream()取得Stream物件,接著對Stream呼叫filter(Predicate)函式,這裡用Lambda Expression就覺得很自然,也提高可讀性,過濾掉低於門檻值的受測者後,呼叫sorted(Comparator)進行排序,然後map(Function)將受測者的全名與分數組成字串(例如:"Spirit Tu: 840.0")當成結果,這邊要小心的是 map(Function)所回傳的Stream物件,裡面裝的已經不是Person物件了,而是字串,所以forEach(Consumer)的Lambda Expression中,e代表的是字串,直接就可以顯示在console上。最後,呼叫showRank(persons, 700)就可以看到曾經得超過700分的受測者排行榜了。Code List 6的整個流程可以畫成如Figure 3的pipeline。

Code List 6 - More interesting example of Stream API
public static void showRank(List<Person> persons, double threshold) {
    persons.stream()
        .filter(p -> p.getHighestScore() > threshold)
        .sorted((p1, p2) -> p1.getHighestScore().compareTo(p2.getHighestScore()))
        .map(p -> String.format("%s: %.1f", p.getFullName(), p.getHighestScore()))
        .forEach(s -> System.out.println(s));
}

Figure 3 - The pipeline of Code List 6

老實說,看到Java Sream API讓我感到相當親切,這應該跟我研究所多年的研究題目是visual dataflow language有關,在VisualTPL中,迴圈的概念被內化了,重點在於做什麼運算(what),而不是如何跑迴圈(how),同樣地,Java Stream API也是把迴圈給內化了,每個operation的重點是要做什麼,大大提高了程式的抽象化程度和可讀性。不過Java Stream API的特色還不只這些,剩下的下一篇再討論。

 
over 4 years ago

In my impression of the offical bimonthly "Java Manazine," many articles discussed the Lambda expression coming with Java 8, but the closure was not been discussed much. With a rough search, I just found the closure mentioned in the article of discussing the Lambda expression in the 2013 July-August issue. I think the phenomenon comes from two reasons: (1) without Lambda, the anonymous class still can capture the (effectively final) variables; (2) the Lambda expression captured variables are still effectively final. Since the free variables don't perfect work, the closure is not an important point to be advertised with the Java Lambda expression. In the description of the Closure on the Wikipeida (not formal enough, but easy to acess than the books of programming languages), the closure can manipulate the captured free variables just like normal variables. However, the supports of the free variables are not the same in all languages.

Objective C, the language I use much in my work as an example. Through the concept of the shared storage, Apple offical document gives some examples to discuss the variables captured by the code blocks. Well, the concept of the memory storage is an unavoidable flaw for the language sensitive to memory address. However, in this article, I try to use a more abstract description to discuss the variables captured by code blocks. By default, the code block only captures the value of the variable, so any change to the variable after the block is not seen by the block. Thus, in Code List 1, the result of the NSLog is 42, not 84.

Code List 1 - The code block that captures the "anInteger" variable without __block
int anInteger = 42;
SimpleCallback callback = ^{
    NSLog(@"Integer is: %i", anInteger);
};
anInteger = 84;
callback();

To capture the variable, not its value, in the code block, a modifier __block should be placed on the variable declaration like Code List 2. With the modifier, the result of NSLog becomes 84 because when the code block executed (line 6), the captured variable anInteger has been changed to 84 (line 5).

Code List 2 - The code block that captures the "anInteger" variable with __block
__block int anInteger = 42;
SimpleCallback callback = ^{
    NSLog(@"Integer is: %i", anInteger);
};
anInteger = 84;
callback();

With the __block modifier, the code block captured the variable, not its value, the value of the variable can be modified inside the code block. Therefore, in Code List 3, the NSLog is called after the execution of callback(), the displayed result of anInteger is 100. I think the design has both advantages and disadvantages. For the language abstraction, the need of the __block modifier forces the programmers be aware of the existence of the memory address that lowers the language abstraction (non-intuitive). However, Objective C can use the modifier to optimize the compilation for performance. Objective C can provide both read-only and read-write captured variables, but also compile-time check, e.g., without __block modifier, any change to the captured variable is seen as compile error. Therefore, I think the __block modifier is not a bad design .

Code List 3 - The code block that changes the value of the captured variable
__block int anInteger = 42;
SimpleCallback callback = ^{
    anInteger = 100;
};
callback();
NSLog(@"Value of original variable is now: %i", anInteger);

Well, it is time back to Java. As mentioned before, without Lambda, the anonymous class can capture the scope-visible vairable x like Code List 4, but the problem is that the captured variable x is effectively final. Therefore, to uncomment the line x = 48; will get a compile error.

Code List 4 - The captured variable in the anonymous class
int x = 24;
Runnable runnable = new Runnable() {

    @Override
    public void run() {
        // x = 48;

        System.out.println(String.format("captured x: %s", x));
    }
};
runnable.run();

Even with Lambda, Code List 5 as an example, the variable x in the Lambda is still a final variable, unable to modify the value. To uncomment the line x = 48; will still get a compile error

Code List 5 - The captured variable in the Lambda expression is still effectively final
int x = 24;
Runnable runnable = () -> {
    // x = 48;

    System.out.println(String.format("captured x: %s", x));
};
runnable.run();

However, the final modifier in Java only limits the change to the variable. If the variable is an object reference, calling the methods of the object is allowed, even that the method will change the status of the object. Thus, Code List 4 can be modified to Code List 6. The same, trying to uncomment the line x = new AtomicInteger(48); will get a compile error, but use x.set(48); can change the value of x. (This also applies to Objective C)

Code List 6 - The captured object in the anonymous class
AtomicInteger x = new AtomicInteger(24);
Runnable runnable = new Runnable() {

    @Override
    public void run() {
        System.out.println(String.format("original x: %s", x));
        // x = new AtomicInteger(48);

        x.set(48);
    }
};
runnable.run();
System.out.println(String.format("x after run(): %s", x));

Therefore, Code List 5 can be also modified to Code List 7. Unfortunately, the data types that support Autoboxing and Unboxing, e.g., Integer, are immutable data type. To use captured variables is not as convenient as JavaScript, or other languages that treat primitive types as objects. However, this way can provide some kinds of free variables similar.

Code List 7 - The captured object in the Lambda expression
AtomicInteger x = new AtomicInteger(24);
Runnable runnable = () -> {
    System.out.println(String.format("original x: %s", x));
    // x = new AtomicInteger(48);

    x.set(48);
};
runnable.run();
System.out.println(String.format("x after run(): %s", x));

Although Java 8 supports Lambda expression, I think the closure is still not the first-class citizen of Java. And as mentioned in the previous article, for testability, besides the one-line Lambda or the code too simple to break, I still like to write small and easy-to-test classes, not Lambda expression. To capture variables? Injecting variables through the constructor can be considered as a good way. As for when to write one-line Lambda or the code too simple to break, I think Stream, the new Collection API, opens the large possibility.

 
over 4 years ago

印象中,在官方的《Java Magazine》雙月刊中,已經探討過好幾回的Java 8及Lambda,但似乎鮮少討論到Closure,稍微快速翻找一下,在2013的七八月號上有看到介紹Lambda的文章中討論到Closure。我想主要的原因應該是(1) 在沒有Lambda前,anonymous class本身就可以捕捉變數,只是捕捉的變數自動被視為final;(2) Java的Lambda所捕獲的變數,某種程度還是不夠自由。無法完整重現自由變數,所以沒有特別去宣傳Java Lambda和Closure之間的關係。翻了一下Wikipedia上對 Closure的描述(我想參考Wikipedia應該比參考某些專論programming languages的書要方便一些),在Closure內可以對自由變數的做任何變數上的操作,包含更動數值,這一點在許多語言的支援上也不見得完全相同。

以最近工作上常寫的Objective C來說,Apple官方文件給了幾個例子,官方文件以儲存空間的角度去探討捕捉變數,這是對記憶體位置特別敏感的語言某種程度上的痛處,不過這裡就用比較抽象的方式解釋code block對於捕捉變數的處理。預設上,code block捕捉變數的值,也就是說在捕捉後對變數的更動在code block內是看不見的,所以Code List 1中NSLog所顯示的結果是42而不是84

Code List 1 - The code block that captures the "anInteger" variable without __block
int anInteger = 42;
SimpleCallback callback = ^{
    NSLog(@"Integer is: %i", anInteger);
};
anInteger = 84;
callback();

若希望code block捕捉變數而不是僅僅數值的話,需像Code List 2在被捕捉的變數宣告上加上__block的修飾字,此時NSLog顯示的結果就會是84,因為當code block執行時(第6行),捕捉的變數anInteger已經變成84了(第5行)。

Code List 2 - The code block that captures the "anInteger" variable with __block
__block int anInteger = 42;
SimpleCallback callback = ^{
    NSLog(@"Integer is: %i", anInteger);
};
anInteger = 84;
callback();

__block修飾字讓code block捕捉變數本身,所以也可以更動變數的值,因此Code List 3中NSLog是在callback()執行後才顯示anInteger的值,結果是100。我個人覺得這樣的處理有好有壞,就抽象程度上,額外需要__block修飾字讓工程師還是意識到記憶體位置的存在,這是降低語言的抽象程度(不直覺)。Objective C可用這些修飾字針對效能提升最佳化編譯結果,不但能同時提供唯讀/讀寫的捕捉變數,另外也提供編譯期間的檢查,例如沒用__block但卻在code block中變更變數值會視為錯誤,某種程度上我覺得是還不錯的設計。

Code List 3 - The code block that changes the value of the captured variable
__block int anInteger = 42;
SimpleCallback callback = ^{
    anInteger = 100;
};
callback();
NSLog(@"Value of original variable is now: %i", anInteger);

好,該回到Java本身了,沒有Lambda前,anonymous class可以像Code List 4那樣捕捉scope中可見的變數x,但最大問題是捕捉的變數x實際上是final變數(effectively final),所以被註解的x = 48;若取消註解會被視為編譯錯誤。

Code List 4 - The captured variable in the anonymous class
int x = 24;
Runnable runnable = new Runnable() {

    @Override
    public void run() {
        // x = 48;

        System.out.println(String.format("captured x: %s", x));
    }
};
runnable.run();

即使改成用Lambda也是一樣的,如Code List 5,在Lambda內x依舊是final變數,無法更動變數值,取消x = 48;的註解依然會是編譯錯誤。

Code List 5 - The captured variable in the Lambda expression is still effectively final
int x = 24;
Runnable runnable = () -> {
    // x = 48;

    System.out.println(String.format("captured x: %s", x));
};
runnable.run();

Java的final修飾字僅限制無法改變變數值,但若變數是個物件,呼叫物件method卻是允許的,即使該method會改變物件內的狀態都是允許的,所以Code List 4可以改寫成Code List 6。同樣,取消x = new AtomicInteger(48);的註解會得到編譯錯誤,但用x.set(48);可以實際改變x的值。(這一點Objective C也是一樣)

Code List 6 - The captured object in the anonymous class
AtomicInteger x = new AtomicInteger(24);
Runnable runnable = new Runnable() {

    @Override
    public void run() {
        System.out.println(String.format("original x: %s", x));
        // x = new AtomicInteger(48);

        x.set(48);
    }
};
runnable.run();
System.out.println(String.format("x after run(): %s", x));

所以,Code List 5也可以改寫成Code List 7。很可惜,能夠Autoboxing and Unboxing的資料型態,例如:Integer,都是immutable的資料型態,使用上無法像JavaScript這類將基礎型別都視為物件的語言那樣方便。不過,某種程度上有點像自由變數了。

Code List 7 - The captured object in the Lambda expression
AtomicInteger x = new AtomicInteger(24);
Runnable runnable = () -> {
    System.out.println(String.format("original x: %s", x));
    // x = new AtomicInteger(48);

    x.set(48);
};
runnable.run();
System.out.println(String.format("x after run(): %s", x));

最後,Java 8雖然支援Lambda,但我覺得Closure某種程度上還不稱不上是Java的第一級居民,而且如前篇所述,為了方便測試,除非是只有一行或是非常簡單的程式碼(too simple to break)不用擔心測試的問題外,我還是比較喜歡寫一些小而易測的class,而不是使用Lambda,至於捕捉變數,透過建構子將變數帶入物件也是一種方式。至於什麼情況下會常寫只有一行或是非常簡單的Lambda呢?我覺得Stream,新的Collection API開啟了相當大的可能性。

 
over 4 years ago

Finally, Java 8 released at March 18, 2014. However, after I developing iOS App in the company, I did not often use Java and had no time to use the beta version of Java 8. Recently, I'm interested in Java 8 and study its new features. One of the highlighted features of Java 8 is the Lambda expression. The Lambda expression can be seen in many languages. I wrote programs in the Lambda expression form most with JavaScript and Objective C (as known as code block). Honestly, I don't like the Lambda very much. I feel okay with the Lambda expression in JavaScript (by passing named function), but I feel the in-place code block of Objective C like spaghetti. In most cases, I use methods that return a code block to keep the program well-structured.

Well, Lambda becomes a part of Java 8, and what changes will make in writing Java programs? Let's see the program before Java 8, sorting a list as an example, to sort a List, the programmer needs to write a class that implements Comparator interface (the IntegerComparator in Code List 1), and then create an instance ascendingComparator as the parameter of List.sort() (see the sortWithComparator method in Code List 2). This is why many programmers say that Java is less productivity. However, I like to program in this way -- writing a lot of small classes because a small class is easy to write, easy to test with JUnit, and easy to reuse.

If the programmer do not want to write a class independently, he/she can write an anonymous class before Java 8, like the sortWithAnonymousClass method in Code List 2. I used this way to write programs at the beginning of learning Java with IDE like JBuilder -- drag-and-drop to design a UI, double-click on the control, and write some programs at the place that IDE auto-generated (yes, I learned the Visual Basic 6 in a similar way). After I learned the formal object-oriented analysis and design, I only use this way in the case that the anonymous class does not affect the overall design. However, many programmers still think the way is less productivity.

Code List 1 - IntegerComparator
package java8.lambdas;

import java.util.Comparator;

public class IntegerComparator implements Comparator<Integer> {

    private boolean _ascending;

    public IntegerComparator() {
        this(true);
    }

    public IntegerComparator(boolean ascending) {
        _ascending = ascending;
    }

    @Override
    public int compare(Integer number1, Integer number2) {
        return _ascending? number1.compareTo(number2) : number2.compareTo(number1);
    }
}

With the Lambda expression in Java 8, the sorting program can be written like the sortWithLambdaExpression method in Code List 2. The lines of code reduced and many programmers think that the readability also improved. The reason is that the sorting implementation is right there, you don't have to jump to another class (file) to read the implementation. I think that the readability improved only when the the logic wrapped by anonymous Lambda is very simple. If the logic is very complicated, put two different logics (logic outside the Lambda and the logic inside the Lambda) together only increase the length of code to read and reduce the readability. In that case, a better way is to extract the logic in the Lambda to a meaningful class. I'm used to the form: (arguments) -> {implementation} to write an anonymous Lambda -- I think that is because I had seen a lot of code blocks with Objective C.

Besides anonymous Lambda, Java 8 also support to pass the existing method as the Lambda, called method reference . For example, the sortWithMethodReference and sortWithStaticMethodReference methods in Code List 2. Comparing the sortWithLambdaExpression and sortWithMethodReference methods, which one is more readable? In my personal opinion, I think sortWithMethodReference is more readable than sortWithLambdaExpression because the method name can tell me that the numbers is sorted from small to large. I usually write Lambda with JavaScript in this way -- passing a meaningful named function.

Code List 2 - LambdaExample
package java8.lambdas;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;

public class LambdaExample {

    public static void main(String[] arguments) {
        if(arguments.length == 0) {
            System.err.print("Nothing to sort. Abort!");
        }
        else
        {
            LambdaExample lambdas = new LambdaExample();
            List<Integer> numbers = new ArrayList<Integer>();
            for(String argument : arguments) {
                numbers.add(Integer.parseInt(argument));
            }
            lambdas.sortWithComparator(new ArrayList(numbers));
            lambdas.sortWithAnonymousClass(new ArrayList(numbers));
            lambdas.sortWithLambdaExpression(new ArrayList(numbers));
            lambdas.sortWithMethodReference(new ArrayList(numbers));
            lambdas.sortWithStaticMethodReference(new ArrayList(numbers));
        }
    }

    public void sortWithComparator(List<Integer> numbers) {
        IntegerComparator ascendingComparator = new IntegerComparator();
        numbers.sort(ascendingComparator);
        print("sort with comparator", numbers);
    }

    public void sortWithAnonymousClass(List<Integer> numbers) {
        numbers.sort(new Comparator<Integer>() {

            @Override
            public int compare(Integer number1, Integer number2) {
                return number1.compareTo(number2);
            }
        });
        print("sort with inner class", numbers);
    }

    public void sortWithLambdaExpression(List<Integer> numbers) {
        numbers.sort((number1, number2) -> number1.compareTo(number2));
        print("sort with Lambda", numbers);
    }

    public void sortWithMethodReference(List<Integer> numbers) {
        numbers.sort(this::compareAscending);
        print("sort with method reference", numbers);
    }

    public void sortWithStaticMethodReference(List<Integer> numbers) {
        numbers.sort(Integer::compare);
        print("sort with static method reference", numbers);
    }

    public int compareAscending(Integer number1, Integer number2) {
        return number1.compareTo(number2);
    }

    private void print(String approach, List<Integer> numbers) {
        System.out.println(approach);
        for(Integer number : numbers) {
            System.out.print(String.format("%d ", number));
        }
        System.out.println();
    }
}

Someone may say that what about closure? I want to discuss the closure in the next article. This article only discussed the productivity and the readability affected by Lambda. For the readability, as mentioned before, simple logic wrapped with Lambda indeed improves the reabability, but put complicated logic wrapped with Lambda and the other logic together will reduce the readability. For productivity, Lambda can write less codes (the basic requirement of a class declaration). Therefore, when the wrapped logic is simple, the productivity improvement is significant. For example, in Code List 1, the compare method only has one line, to declare a class for that line costs a lot. However, if the wrapped logic is complicated, the productivity improvement is not so important.

In the long term, writing less codes may not improve the productivity, but well-testable codes improve the productivity. This is also the reason I use methods to return code blocks when I developing iOS App. I believe that increasing the testability of a program can improve much more productivity than writing less codes because the time to maintain an existing program is longer than that to write new codes. As shown in Code List 3, doSomething may be an integration method and an anonymous Lambda wrapped logic inside the method. In order to test the wrapped logic, the only way is to test the entire doSomething method. If the logic is wrapped as independent class like IntegerComparator, it is easy to test the logic by test IntegerComparator directly without testing the integration method doSomething. For testability, using method reference is a better way than using anonymous Lambda.

Code List 3 - Testability
public void doSomething() {
    List<Integer> numbers = complexPreparation();
    numbers.sort((number1, number2) -> number1.compareTo(number2));
    complexPostProcess(numbers);
}

Someone may think the sample code in Code List 3 is not a good example. There are other ways to improve the testability for Code List 3 indeed, but, I still think the anonymous Lambda does not provide a good testablity. Therefore, I prefer the method reference way to write Lambda with Java 8. Besides testability, I can follow the OO principles, e.g., single responsibility principle, to put the methods in the appropriate classes, and then use the method reference to use the methods as Lambda. As a result, the method reference approach increases readability (with meaningful method name), maintainability (good OO desing and easy to test) and also reusability (anonymous Lambda can not be reused, but method reference can reuse methods).

 
over 4 years ago

Java 8終於在2014的3月18日正式釋出了,不過自從用Objective C開發iOS App後,我已經有好一陣子沒碰Java,期間曾經有短暫寫一點點,但卻沒有時間去用beta版的Java 8,直到最近才又開始玩一下。Java 8最亮眼的特色之一應該就是所謂的Lambda表示法,Lambda表示法幾乎內建在很多語言中,而我用最多的應該在JavaScript和Objective C (code block)中了。但老實說,我對於Lambda其實不怎麼有愛,JavaScript版的寫法我覺得還好,但Objective C的in place code block我看了覺得好亂,後來大多數的情況下,我都用method回傳code block的方式在使用。

既然,Java 8開始支援Lambda表示法,那程式的撰寫上會變成怎樣呢?首先,以排序為例,先回到沒有Lambda的Java世界,要排序一個List,要先寫一個客製的Comparator (Code List 1的IntegerComparator),然後再建立一個comparator物件作為List.sort()函式的參數(如Code List 2中的sortWithComparator函式),就這點,很多人覺得Java很沒有生產力(Productivity),不過我個人是還蠻喜歡這種寫法,我喜歡寫很多小而簡單的class,因為用JUnit測試這些class也很容易(小class邏輯相對簡單易測),另一個好處是越是小的class越是容易重複利用。

如果不想另外寫一個class,在沒有Lambda的Java世界中其實還有一種寫法,也就是匿名class (anonymous class),如Code List 2中的sortWithAnonymousClass函式就是這種寫法,這種寫法讓我想起早期剛開始學Java時用JBuilder這類的IDE拖拉UI,然後在按鈕上點2下,IDE會自動產生一段程式碼,然後引導你到某個位置開始寫程式(嗯...Visual Basic 6好像也是這樣),IDE產生出來的程式大多是這類的匿名class。在學了比較正式的物件導向設計後,我幾乎不再用這種寫法,只有偶而在比較沒有影響整體設計的情況下,會偷懶使用一下。不過這種寫法還是有人覺得沒生產力。

Code List 1 - IntegerComparator
package java8.lambdas;

import java.util.Comparator;

public class IntegerComparator implements Comparator<Integer> {

    private boolean _ascending;

    public IntegerComparator() {
        this(true);
    }

    public IntegerComparator(boolean ascending) {
        _ascending = ascending;
    }

    @Override
    public int compare(Integer number1, Integer number2) {
        return _ascending? number1.compareTo(number2) : number2.compareTo(number1);
    }
}

Lambda出現後,程式變成sortWithLambdaExpression函式所示那樣,確實行數減少不少,也有人認為可讀性提高不少,認為可讀性提高的原因是,程式碼就在那裡,不像過去還需要跳到另外一個class才能看到實作。關於可讀性這一點,我覺得只有在Lambda內的程式邏輯很簡單才成立,如果邏輯很複雜,把兩個不同邏輯的程式碼放在一起,不僅長度變長,可讀性反而降低,與其這樣還不如抽出來成為一個class並給予一個有意義的class名稱。至於語法用(arguments) -> {implementation}表示,可能是Objective C的code block看多了,有比剛開始第一次看到Java Lambda表示法稍微習慣多了。

Java 8除了這種匿名的Lambda表示法,其實也是支援將既有函式當成Lambda使用的用法,例如Code List 2中的sortWithMethodReferencesortWithStaticMethodReference。不知道大家覺得sortWithLambdaExpressionsortWithMethodReference兩相比較下,哪個可讀性較高呢?我個人是覺得sortWithMethodReference比較高,因為從函式的名稱就可以知道是以升冪的方式排序。我個人在寫JavaScript時也是比較喜歡這種寫法。

Code List 2 - LambdaExample
package java8.lambdas;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;

public class LambdaExample {

    public static void main(String[] arguments) {
        if(arguments.length == 0) {
            System.err.print("Nothing to sort. Abort!");
        }
        else
        {
            LambdaExample lambdas = new LambdaExample();
            List<Integer> numbers = new ArrayList<Integer>();
            for(String argument : arguments) {
                numbers.add(Integer.parseInt(argument));
            }
            lambdas.sortWithComparator(new ArrayList(numbers));
            lambdas.sortWithAnonymousClass(new ArrayList(numbers));
            lambdas.sortWithLambdaExpression(new ArrayList(numbers));
            lambdas.sortWithMethodReference(new ArrayList(numbers));
            lambdas.sortWithStaticMethodReference(new ArrayList(numbers));
        }
    }

    public void sortWithComparator(List<Integer> numbers) {
        IntegerComparator ascendingComparator = new IntegerComparator();
        numbers.sort(ascendingComparator);
        print("sort with comparator", numbers);
    }

    public void sortWithAnonymousClass(List<Integer> numbers) {
        numbers.sort(new Comparator<Integer>() {

            @Override
            public int compare(Integer number1, Integer number2) {
                return number1.compareTo(number2);
            }
        });
        print("sort with inner class", numbers);
    }

    public void sortWithLambdaExpression(List<Integer> numbers) {
        numbers.sort((number1, number2) -> number1.compareTo(number2));
        print("sort with Lambda", numbers);
    }

    public void sortWithMethodReference(List<Integer> numbers) {
        numbers.sort(this::compareAscending);
        print("sort with method reference", numbers);
    }

    public void sortWithStaticMethodReference(List<Integer> numbers) {
        numbers.sort(Integer::compare);
        print("sort with static method reference", numbers);
    }

    public int compareAscending(Integer number1, Integer number2) {
        return number1.compareTo(number2);
    }

    private void print(String approach, List<Integer> numbers) {
        System.out.println(approach);
        for(Integer number : numbers) {
            System.out.print(String.format("%d ", number));
        }
        System.out.println();
    }
}

有人可能會提到Closure才是最主要的精神,關於這點,我想留到下一篇在討論,這篇先討論Lambda對於可讀性和生產力的影響。就可讀性來說,剛剛已經提過了,如果是簡單的邏輯用Lambda包起來,確實有提高一部分的可讀性,但如果邏輯很複雜,混雜在一起我個人是覺得反而降低可讀性。就生產力來說,Lambda某種程度上可以少寫一些程式碼(組成類別的宣告等最低成本),和剛剛相同,如果邏輯簡單,例如Code List 1中compare函式只有一行,那為了這一行所付出的成本是很高的,但如果邏輯複雜,少寫最低成本所產生的生產力提升效益就有限了。

而且生產力不能只看少寫多少行程式,以長遠來看,好測試的程式碼反而能帶來更高的生產力,這一點也是我最近開發iOS App時大量使用method回傳code block的原因,我認為提高可測性所產生的生產力提升效益比直接少寫程式碼來的高很多,畢竟維護程式碼的時間遠比撰寫程式碼的時間來的多很多。以Code List 3為例,doSomething可能是一個整合的函式,其中夾雜了一個Lambda表示法的邏輯在裡面,為了測這段Lambda表示法的邏輯,必須透過測試doSomething來完成,但如果是一個單獨的class如IntegerComparator這樣,可以不用透過測doSomething來完成,而是直測IntegerComparator。或是用method reference的方式也很好測。

Code List 3 - Testability
public void doSomething() {
    List<Integer> numbers = complexPreparation();
    numbers.sort((number1, number2) -> number1.compareTo(number2));
    complexPostProcess(numbers);
}

有人可能覺得Code List 3的例子不太好,確實Code List 3要提高可測性的方法還有很多,不過,我認為匿名Lambda表示法的可測性是很低的。所以如果真的要使用Java 8的Lambda表示法,我應該會傾向使用method reference的方式,除了可測性外,另一個原因是, method reference可以根據OO的準則,例如single responsibility principle,將合適的函式放在合適的class中,然後用method reference的方式直接引用作為Lambda來使用。如此一來,既可提高可讀性(函式名稱本身可以提高可讀性)、可維護性(好測試,好的OO設計),也提高了可重複利用的程度(匿名Lambda無法重複利用,但method reference可以)。