HadoopWritableSerialization是什么

本篇內(nèi)容主要講解“Hadoop WritableSerialization是什么”，感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷，實用性強(qiáng)。下面就讓小編來帶大家學(xué)習(xí)“Hadoop WritableSerialization是什么”吧!

創(chuàng)新互聯(lián)建站主要從事成都網(wǎng)站設(shè)計、成都網(wǎng)站制作、網(wǎng)頁設(shè)計、企業(yè)做網(wǎng)站、公司建網(wǎng)站等業(yè)務(wù)。立足成都服務(wù)淇濱,10多年網(wǎng)站建設(shè)經(jīng)驗,價格優(yōu)惠、服務(wù)專業(yè),歡迎來電咨詢建站服務(wù):028-86922220

Serialization Framework

Hadoop有一個可替換的serialization framework API. 一個序列化框架用一個Serialization的實現(xiàn)來表示。

WritableSerialization

WritableSerialization是對Writable類型的Serialization的實現(xiàn)。

package: org.apache.hadoop.io.serializer
public class WritableSerialization extends Configured 
                         implements Serialization {
    static class WritableSerializer extends Configured 
                         implements Serializer {
         @Override
         public void serialize(Writable w) throws IOException {}
    }

    static class WritableDeserializer extends Configured 
                                      implements Deserializer {
        @Override
         public Writable deserialize(Writable w) throws IOException {}
    }

    @Override
    public Serializer getSerializer(Class c) {
         return new WritableSerializer();
    }
    
    @InterfaceAudience.Private
    @Override
    public Deserializer getDeserializer(Class c) {
        return new WritableDeserializer(getConf(), c);
    }
}

JavaSerialization

JavaSerialization是對Serializable類型的Serialization的實現(xiàn)。它使用標(biāo)準(zhǔn)的Java Object Serialization。盡管它有利于方便的使用標(biāo)準(zhǔn)java類型，但是Java的標(biāo)準(zhǔn)化，效率較差。

package: org.apache.hadoop.io.serializer

public class JavaSerialization extends Object implements Serialization{}

為什么不用Java Object Serialization?

1. Not Compact. 每次序列化，都要寫入類的名字，同一個類的后續(xù)實列只引用第一個次出現(xiàn)的句柄。這不太適合隨即訪問，排序，切分。

2. Nor Fast. 每次需要創(chuàng)建新的實例，浪費(fèi)空間。

3. Extensible. 這個可以有，支持演化的新類型。目前無法和writable支持。

4. Interoperational. 理論可行，但是目前只有Java實現(xiàn)。Writable 也是如此。

Avro

Avro是一個獨(dú)立于編程語言的數(shù)據(jù)序列化系統(tǒng)，它使用接口定義語言(IDL)定義Schema，然后可以生成其他語言的原生代碼。Avro Schema通常用JSON來編寫，數(shù)據(jù)通常用二進(jìn)制格式編碼。

Avro有很強(qiáng)的Data Schema Resolution能力，就是說讀數(shù)據(jù)和寫數(shù)據(jù)的Schema不必完全相同，Avro支持?jǐn)?shù)據(jù)演化。

和其他序列化系統(tǒng)(Thrift 和 google Protocol Buffers)相比, Avro的性能更好。

Avro的Datatype和Schema

Primitive Datatype

null, boolean,int,long,float,double,bytes,string

Complex Datatype

array,排過序的同類型對象集合

{ "name":"myarray","type":"array", "items":"long" }

map,未排序的k-v對，key必須是string,schema只定義value

{ "name":"mymap","type":"map", "values":"string" }

record, 類似于struct,這個在數(shù)據(jù)格式中非常常用。

{ "type":"record","name":"weather-record","doc":"a weather reading.",

"fields":[

{"name":"myint","type":"int"},

{"name":"mynull","type":"null"}

]

}

enum,命名集合

{ "type":"enum",

"name":"title",

"symbols":["engineer","Manager","vp"]

}

fixed,固定8位無符號字節(jié)

{ "type":"fixed","name":"md5"}

union,Schema的并集，使用json數(shù)組標(biāo)志。數(shù)據(jù)必須與并集的一個類型匹配。

[ "type":"int","type":"long",{"type":"array", "items":"long" }]

表示數(shù)據(jù)必須是int,long, 或者long數(shù)組中的一個。

Avro的演化，略。

問題：

Avro如何排序？

Avro如何splitable?

Object Container File

Avro的數(shù)據(jù)文件結(jié)構(gòu)如下:

文件頭

四字節(jié), ASCII 'O', 'b', 'j', followed by 1.

file metadata

The 16-byte, randomly-generated sync marker for this file.

All metadata properties that start with "avro." are reserved.

avro.schemacontains the schema of objects stored in the file, as JSON data (required).

avro.codecthe name of the compression codec used to compress blocks, as a string. Implementations are required to support the following codecs: "null" and "deflate". If codec is absent, it is assumed to be "null". The codecs are described with more detail below.

Required Codecs

null

The "null" codec simply passes through data uncompressed.

deflate

The "deflate" codec writes the data block using the deflate algorithm as specified in RFC 1951, and typically implemented using the zlib library. Note that this format (unlike the "zlib format" in RFC 1950) does not have a checksum.

Optional Codecs

snappy

The "snappy" codec uses Google's Snappy compression library. Each compressed block is followed by the 4-byte, big-endian CRC32

checksum of the uncompressed data in the block.

一個和多個數(shù)據(jù)塊data blocks.

A long indicating the count of objects in this block.

A long indicating the size in bytes of the serialized objects in the current block, after any codec is applied

The serialized objects. If a codec is specified, this is compressed by that codec.

The file's 16-byte sync marker.

Thus, each block's binary data can be efficiently extracted or skipped without deserializing the contents. The combination of block size, object counts, and sync markers enable detection of corrupt blocks and help ensure data integrity.

Avro Read/Write

Schema和數(shù)據(jù)訪問可以用GenericRecord,也可以使用SpecificRecord，需要用到Avro-Tools來生成對象類

% hadoop jar /usr/lib/avro/avro-tools.jar compile schema /pair.avsc /home/cloudera/workspace/

Schema, namespace會被注入到生成類中.

{
  "namespace":"com.jinbao.hadoop.hdfs.avro.compile",
  "type":"record",
  "name":"MyAvro",
  "fields":[
    { "name":"name","type":"string" },
    { "name":"age","type":"int" },
    { "name":"isman","type":"boolean" }
  ]
}

代碼如下

public class AvroTest {

	private static String avscfile = "/home/cloudera/pair.avsc";
	private static String avrofile = "/home/cloudera/pair.avro";
	/**
	 * @param args
	 * @throws IOException 
	 */
	public static void main(String[] args) throws IOException {

		//schemaReadWrite();
		
		// WriteData();
		
		ReadData();
		
	}
	
	private static void schemaReadWrite() throws IOException {
		/// Read Schema from schema file
		Parser ps = new Schema.Parser();
		Schema schema = ps.parse(new File(avscfile));
		if(schema != null){
			System.out.println(schema.getName());
			System.out.println(schema.getType());
			System.out.println(schema.getDoc());
			System.out.println(schema.getFields());
		}
		
		/// construct a record.
		GenericRecord datum = new GenericData.Record(schema);
		datum.put("left", new String("mother"));
		datum.put("right", new String("father"));
		
		/// write to outputstream
		ByteArrayOutputStream out = new ByteArrayOutputStream();
		DatumWriter writer = new GenericDatumWriter(schema);
		Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
		writer.write(datum, encoder);
		encoder.flush();
		out.close();
		
		/// read from inputstream
		DatumReader reader =  new GenericDatumReader(schema);
		Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
		GenericRecord record = reader.read(null, decoder);
		System.out.print(record.get("left"));
		System.out.print(record.get("right"));
		
	}
	
	public static void WriteData() throws IOException{
		Parser ps = new Schema.Parser();
		Schema schema = ps.parse(new File(avscfile));
		File file = new File(avrofile);
		
		DatumWriter writer = new GenericDatumWriter(schema);
		
		DataFileWriter fileWriter = new DataFileWriter(writer);
		
		fileWriter.create(schema, file);
		MyAvro datum = new MyAvro();
		for(int i = 0;i<5;i++){
			datum.setName("name1" + i);
			datum.setAge(10 + i);
			datum.setIsman( i % 2 == 0);
			
			fileWriter.append(datum);
		}
		
		fileWriter.close();
	}
	
	public static void ReadData() throws IOException{
		File file = new File(avrofile);
		DatumReader reader = new GenericDatumReader();
		DataFileReader fileReader = new DataFileReader(file,reader);
		
		Schema schema = fileReader.getSchema();
		
		System.out.println(fileReader.getSchema());
		
		GenericRecord record = null;
		MyAvro datum = null;
		while(fileReader.hasNext()){
			record = fileReader.next();
			System.out.println(record.toString());
			
			// Convert GenericRecord to SpecificRecord
			datum = (MyAvro) SpecificData.get().deepCopy(schema, record);
			System.out.println(datum.toString());
		}
		fileReader.seek(0);
		fileReader.sync(0);

		fileReader.close();
	}
}

到此，相信大家對“Hadoop WritableSerialization是什么”有了更深的了解，不妨來實際操作一番吧！這里是創(chuàng)新互聯(lián)網(wǎng)站，更多相關(guān)內(nèi)容可以進(jìn)入相關(guān)頻道進(jìn)行查詢，關(guān)注我們，繼續(xù)學(xué)習(xí)！

新聞名稱：HadoopWritableSerialization是什么
標(biāo)題路徑：http://weahome.cn/article/pojiod.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

HadoopWritableSerialization是什么

Serialization Framework

WritableSerialization

JavaSerialization

Avro

Avro的Datatype和Schema

Object Container File

文件頭

一個和多個數(shù)據(jù)塊data blocks.

Avro Read/Write

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管