HDFS is built around the idea that the most efficient data processing pattern is a write-once, read-many-times pattern. Each analysis will involve a large proportion, if not all, of the dataset, so the time to read the whole dataset is more important than the latency in reading the first...
3.5.7.1. Anatomy of a File Read 1) 客户端通过调用FileSystem对象的open()方法打开需要读取的文件,对HDFS来说是调用DistributedFileSystem