[Hive] LLAP(Live Long And Process)의 특징

Hive

[Hive] LLAP(Live Long And Process)의 특징

Sencia 2021. 4. 16. 13:16

hive2.0 에서 도입
자주 사용 되는 데이터를 캐싱 하여 작업속도를 올리는 기술
LLAP를 활용해 Hive 를 이용할 경우 캐싱을 위한 YARN Queue 를 별도로 할당 받고 설정한 리소스를 상시 점유

출처: https://cwiki.apache.org/confluence/display/Hive/LLAP

LLAP 구성

구분	설명
Hive Interactive Server	Hive LLAP에 연결하기 위해 JDBC 인터페이스를 제공하는 Thrift 서버
Slider AM	LLAP 데몬을 생성, 모니터링, 유지하는 슬라이더 프로그램
TEZ AM query coordinator	TEZ AM은 사용자의 요청을 받아 LLAP 데몬(JVM) 내에서 사용할 수 있는 실행기에서 실행
LLAP 데몬	캐싱, JIT 최적화를 용이하게 하고 바로 호출할 수 있도록 클러스터의 worker node에서 실행 I/O, 캐싱, 쿼리 실행을 실질적으로 처리

LLAP configuration

구분	parameter	hive의 conf 영역	역할 혹은 컴포넌트
Slider size	slider_am_container_mb	hive-interactive-env	=yarn.scheduler.minimum-allocation-mb
Tez AM coordinator Size	tez.am.resource.memory.mb	tez-interactive-site	=yarn.scheduler.minimum-allocation-mb
Number of Cordinators	hive.server2.tez.sessions.per.default.queue	Settings	LLAP가 처리할 수 있는 동시 쿼리 수 어떻게 설정하느냐에 따라 설정한 수의 TEZ AM 생성
LLAP DaemonSize	hive.llap.daemon.yarn.container.mb	hive-interactive-site	yarn.scheduler.minimum-allocation-mb <= Daemon Size <= yarn.scheduler.maximu-allocation-mb. Rule of thumb always set it to yarn.scheduler.maximu-allocation-mb.
Number of Daemon	num_llap_nodes_for_llap_daemons	hive-interactive-env	실행되는 LLAP 데몬 수
ExecutorSize	hive.tez.container.size	hive-interactive-site	4~6GB 추천 각 익스큐터에 하나의 VCPU를 할당해야함
Number of Executor	hive.llap.daemon.num.executors		YARN에서 최대 VCore수 설정

LLAP 데몬 configuration

구분	parameter	영역	역할 혹은 컴포넌트
Maximum YARN container size	yarn.scheduler.maximu-allocation-mb	YARN	컨테이너에 할당할 수 있는 최대 메모리 설정 노드에서 큰 컨테이너로 LLAP 데몬 실행하는 것이 좋음
Daemon size	hive.llap.daemon.yarn.container.mb	hive-interactive-site	yarn.scheduler.minimum-allocation-mb <= Daemon Size <= yarn.scheduler.maximu-allocation-mb. Rule of thumb always set it to yarn.scheduler.maximu-allocation-mb.
Headroom	llap_headroom_space	hive-interactive-env	6GB 혹은 데몬 크기의 5% Heap에서 떨어져 있으나 LLAP 데몬의 일부
Heap size	llap_heap_size	hive-interactive-env	executor의 수 hive.tez.container.size
Cache size	hive.llap.io.memory.size	hive-interactive-site	DaemonSize - HeapSize – Headroom Heap에서 떨어져 있으나 LLAP 데몬의 일부
LLAP Queue size			Slider AM 크기 + tez container의 수 (hive.tez.container.size + LLAP 데몬 크기)

YARN 위에서 LLAP는 어떻게 동작?

LLAP Interactive Query Configuration. / 출처: https://community.cloudera.com/t5/Community-Articles/Hive-LLAP-deep-dive/ta-p/248893

LLAP YARN Queue Configuration / 출처: https://community.cloudera.com/t5/Community-Articles/Hive-LLAP-deep-dive/ta-p/248893

LLAP Daemon

Yarn.scheduler.minimum-allocation.mb <= hive LLAP 데몬 크기 <= Yarn.scheduler.maximum-allocation.mb
데몬 크기 = headroom + executors + cache
executor들 남은 heap들에 할당되며 이 heap들은 모두 off-heap
XMX는 executor들을 실행하기 위해 메모리 할당받음
executor의 수는 아래 수식으로 계산하거나 경험에 따라 각 익스큐터가 사용할 vcpu 설정
min(maximum container size vcore, (LLAP 데몬 크기 - headroom - cachce) / hive.tez.container.size)

캐싱

LLAP는 메타데이터, 데이터 모두 캐싱
- 메타데이터, index: java object로 프로세스에 저장
- 데이터: off-heap에 저장

캐싱 정책

정책명	설명
Eviction policy	테이블 스캔을 사용하여 작업부하 분석 - default: LRFU
Caching granularity	Column-chunk 단위로 저장 파일 포맷, 실행하는 엔진에 따라 chunk의 granulariy 결정 이를 이용하여 processing overhead, 스토리지 효율성간 조정

보안

LLAP 서버는 태생적으로 파일 단위보다도 더 세분화된 레벨의 access control이 가능하다.
데몬이 처리 중인 컬럼과 레코드를 알고 있기 때문에 이런 오브젝트에 대한 정책 적용이 가능하다.

메모리

LLAP에 할당하는 메모리는 YARN의 메모리를 보고 결정해야한다.
LLAP에 올린 테이블은 액세스 속도도 좋지만 LLAP가 차지하는 메모리는 상시 메모리이기 때문에 많이 설정할 경우 다른 작업을 수행할떄 해당 리소스를 사용할 수 없다.

VCore

LLAP에서 사용하는 core도 메모리와 마찬가지로 YARN에서 실행된다.
따라서 LLAP의 vcore를 설정할때 YARN의 총 CPU 수의 80%와 동일하게 설정되어있는지 확인해야 한다.

cpu isolation이 설정되어있는 YARN이라면 이 vcore 수 설정이 더 중요하다.

참고

community.cloudera.com/t5/Community-Articles/Hive-LLAP-deep-dive/ta-p/248893

Hive LLAP deep dive

Thanks to Christoph Gutsfeld, Matthias von Görbitz and Rene Pajta for all their valuable pointers for writing this article. The article provides a indetailed and thorugh understanding of Hive LLAP. Understanding YARN YARN is essentially a system for manag

community.cloudera.com

cwiki.apache.org/confluence/display/Hive/LLAP

LLAP - Apache Hive - Apache Software Foundation

Live Long And Process (LLAP) functionality was added in Hive 2.0 (HIVE-7926 and associated tasks). HIVE-9850 links documentation, features, and issues for this enhancement. For configuration of LLAP, see the LLAP Section of Configuration Properties. Overvi

cwiki.apache.org

저작자표시

'Hive' 카테고리의 다른 글

[Hive] Hive3 주요 특징 (0)	2021.04.06

현재글[Hive] LLAP(Live Long And Process)의 특징

Sencia X 데엔

kudu disk, Hadoop, kafka, SPARK, kudu backup, kudu 질의, ambari, Adaptive Query Execution, hbase, hadoop3, AQE, HDFS, 장애, HIVE, kudu 백업, cloudera, kudu, zkfc, Dynamic partition pruning, spark3,

Today :
Yesterday :

일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Sencia X 데엔

[Hive] LLAP(Live Long And Process)의 특징

LLAP 구성