Mapreduce set map number _mapreduce set map memory

Summary

FileInputFormat splits the input file into splits before reading the data in the map phase. The number of splits determines the number of maps.

The factors that affect the number of maps, that is, the number of splits are:

1) The size of the HDFS block, which is the value of dfs.block.size in HDFS. If there is an input file of 1024m, when the block is

When 256m, it will be divided into 4 splits; when the block is 128m, it will be divided into 8 splits.

2) The size of the file. When the block is 128m, if the input file is 128m, it will be divided into 1 split; when the block is 256m,

Will be divided into 2 splits.

3) The number of files. FileInputFormat splits splits by file and only splits large files, ie those sizes exceed

The file size of the HDFS block. If dfs.block.size is set to 64m in HDFS and there are 100 files in the input directory, then

The number of splits after division is at least 100.

4) The size of the splitsize. Fragmentation is divided according to the size of splitszie, and the size of a split is not set.

The default is equal to the size of the hdfs block. But the application can adjust the splitsize with two parameters.

The formula for calculating the number of maps is as follows:

1, splitsize = max (minimumsize, min (maximumsize, blockssize)).

If minimumsize and maximumsize are not set, the size of splitsize is equal to blocksize by default.

2, the calculation formula

The calculation process can be simplified to the following formula. The detailed algorithm can refer to the getSplits method in the FileInputSplit class.

Total_split ;

For(file : enter each file in the directory)

{

File_split = 1;

If(file.size)splitsize)

{

File_split=file_size/splitsize;

}

Total_split+=file_split;

}

Mapreduce set map number _mapreduce set map memory

About configuring MapReduce memory 1. Monitoring the use of Container memory

NodeManager is a daemon in the Yarn runtime. One of the duties is to monitor the container running on the node, usually containing the memory usage of each container.

To monitor the memory usage of the container, configure the yarn.nodemanager.container-monitor.interval-ms property in the Yarn configuration file yarn-site.xml to traverse the currently running container, and calculate the process tree (each container). All the child processes in the process, check the /proc/"pid"/stat file for each process (where pid is the process ID of the container), extract physical memory (also known as RSS) and virtual memory (also known as VSZ or VSIZE) .

Configure yarn-nodemanager.vmem-check-enabled to control whether the virtual check is required to be opened. Yarn then compares the extracted VSIZE of the container and its child processes with the maximum allowed virtual memory of the container. The maximum allowable virtual memory is the container's maximum usable physical memory &TImes;yarn.nodemanager.vmem-pmem-raTIo (default is 2.1). So, if the maximum available physical memory configured by the Yarn container is 2GB, then multiplying by 2.1 is the maximum available virtual memory of the container, 4.2G.

Configure the yarn.nodemanager.pmem-check-enabled property to control whether the physical memory check is turned on, and then Yarn compares the extracted container and its child processes to the maximum allowed physical memory of the container.

If one of the physical memory or virtual memory usage is greater than the maximum allowed usage, Yarn will kill the container. And print the following logs in the log:

ApplicaTIon applicaTIon_1409135750325_48141 failed 2 times due to AM Container for

Appattempt_1409135750325_48141_000002 exited with exitCode: 143 due to: Container

[pid=4733, containerID=container_1409135750325_48141_02_000001] is running beyond physical memory limits.

Current usage: 2.0 GB of 2 GB physical memory used; 6.0 GB of 4.2 GB virtual memory used. Killing container.

Mapreduce set map number _mapreduce set map memory

2. Increase the memory available for MapReduce jobs

For MapReduce jobs, there are two ways to configure the memory that the job runs:

1) Set the physical memory of the Map and Reduce processes;

2.) Set the JVM heap size for the Map and Reduce processes.

3) Set the physical memory of the Map and Reduce processes

Configure the following properties in the map-site.xml file to limit the memory size of the Map and Reduce processes:

"property"

"name" mapreduce.map.memory.mb "/name"

"value" 2048 "/value"

"/property"

"property"

"name" mapreduce.reduce.memory.mb "/name"

"value" 4096 "/value"

"/property"

The preceding example shows that the Map process is 2G and the Reduce process is 4G.

Note: The physical memory configured for the job must fall within the minimum and maximum memory allowed by the container in the cluster. See: yarn.scheduler.maximum-allocation-mb and yarn.scheduler.minimum-allocation-mb.

4) Set the JVM heap size of the Map and Reduce processes

The JVM heap size needs to be smaller than the physical memory of the process, and is generally set to 80% of the physical memory.

"property"

"name" mapreduce.map.java.opts "/name"

"value"-Xmx1638m "/value"

"/property"

"property"

"name" mapreduce.reduce.java.opts "/name"

"value"-Xmx3278m "/value"

"/property"

The above example shows that the heap of Map is set to 2G, and the heap size of Reduce is set to 4G.

Mapreduce set map number _mapreduce set map memory

Linear Encoder

Draw-wire sensors of the wire sensor series measure with high linearity across the entire measuring range and are used for distance and position measurements of 100mm up to 20,000mm. Draw-wire sensors from LANDER are ideal for integration and subsequent assembly in serial OEM applications, e.g., in medical devices, lifts, conveyors and automotive engineering.

Linear Encoder,Digital Linear Encoder,Draw Wire Sensor,1500Mm Linear Encoder

Jilin Lander Intelligent Technology Co., Ltd , https://www.jilinlandermotor.com

Posted on