Spark Tuning

Data Serialization, Memory Tuning, Garbage Collection Tuning

Data Serialization

๊ฐ์ฒด๋ฅผ ์ง๋ ฌํ™”ํ•  ๋•Œ ๋„ˆ๋ฌด ๋Š๋ฆฌ๋ฉด ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ „์ฒด ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋  ์ˆ˜ ์žˆ๋‹ค. Spark ์™€ ๊ฐ™์€ ๋ถ„์‚ฐ ํ™˜๊ฒฝ์—์„œ ๋ฐ์ดํ„ฐ ์ง๋ ฌํ™”๋Š” ์„ฑ๋Šฅ์— ๋งŽ์€ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค.

Spark๋Š” ์›Œ์ปค ๋…ธ๋“œ๋“ค ์‚ฌ์ด์— ๋ฐ์ดํ„ฐ๋ฅผ ์…”ํ”Œ๋ง ํ•  ๋•Œ, ํ˜น์€ RDD๋ฅผ ๋””์Šคํฌ์— ์ €์žฅํ•  ๋•Œ ์ง๋ ฌํ™”ํ•œ๋‹ค. Spark๋Š” ์ง๋ ฌํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด Kryo Serializer(ํฌ๋ฆฌ์˜ค ์ง๋ ฌํ™”)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, Java Serializer ์‚ฌ์šฉํ•  ๋•Œ ๋ณด๋‹ค 10๋ฐฐ ์ด์ƒ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. Kryo Serializer ์‚ฌ์šฉํ•˜๋ ค๋ฉด Spark ์„ค์ • ํŒŒ์ผ์—์„œ Kryo ์ง๋ ฌํ™” ์„ค์ •ํ•˜๋ฉด ๋œ๋‹ค.

Kyro ์‚ฌ์šฉ ์„ค์ •

spark-defaults.conf
spark.serializer org.apache.spark.serializer.KryoSerializer

Kryo์— ์‚ฌ์šฉ์ž ํด๋ž˜์Šค ๋“ฑ๋ก Kryo Serializer๋Š” ์ „์ฒด ํด๋ž˜์Šค ์ด๋ฆ„์„ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ž์› ๋‚ญ๋น„๊ฐ€ ์‹ฌํ•˜๋ฏ€๋กœ ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ํด๋ž˜์Šค๋ฅผ ๋“ฑ๋กํ•ด์„œ ์ €์žฅํ•ด์•ผ ํ•œ๋‹ค.

val conf = new SparkConf().setMaster(...).setAppName(...)
conf.registerKryoClasses(Array(classOf[MyClass1], classOf[MyClass2]))
val sc = new SparkContext(conf)

Memory Tuning

์ง๋ ฌํ™”๋ฅผ ์ž˜ ์ˆ˜ํ–‰ํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”์—๋„ ๋„์›€์ด ๋œ๋‹ค. Spark Storage Level ์„ค์ •์„ StorageLevel.MEMORY_ONLY_SER ๋กœ ์„ค์ •ํ•˜๋ฉด ๋ฐ์ดํ„ฐ๋ฅผ ์ง๋ ฌํ™”ํ•ด์„œ ๋ฉ”๋ชจ๋ฆฌ์— ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค. RDD ํŒŒํ‹ฐ์…˜์„ ๋ฐ”์ดํŠธ ๋ฐฐ์—ด ํ•˜๋‚˜๋กœ ๋งŒ๋“ค์–ด ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

ํ•˜์ง€๋งŒ ๋ฐ์ดํ„ฐ๋ฅผ ์ง๋ ฌํ™”ํ•˜๋ฉด ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ค„์ผ ์ˆ˜๋Š” ์žˆ์œผ๋‚˜, ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผํ•  ๋•Œ ์—ญ์ง๋ ฌํ™” ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ์‹œ๊ฐ„์ด ๋” ์˜ค๋ž˜ ๊ฑธ๋ฆด ์ˆ˜ ์žˆ๋‹ค.

๊ธฐ๋ณธ Storage Level์€ StorageLevel.MEMORY_ONLY ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, RDD๋ฅผ ์—ญ์ง๋ ฌํ™”ํ•˜์—ฌ JVM์— ์ €์žฅํ•œ๋‹ค.

Garbage Collection Tuning

Java ๋Š” ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ™•๋ณดํ•˜๊ธฐ ์œ„ํ•ด ์ฃผ๊ธฐ์ ์œผ๋กœ ์˜ค๋ž˜๋œ ๊ฐ์ฒด๋ฅผ ์ถ”์ ํ•ด์„œ ์ œ๊ฑฐํ•˜๋Š” GC ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š”๋ฐ, ์ด ๊ณผ์ •์—์„œ GC ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

GC ๋น„์šฉ์„ ๋‚ฎ์ถ”๋Š” ๋ฐฉ๋ฒ•์—๋Š” ํฌ๊ฒŒ ๋‘๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

์ฒซ๋ฒˆ์งธ๋Š” GC ๋น„์šฉ์ด Java ๊ฐ์ฒด ์ˆ˜์™€ ๋น„๋ก€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ์ฒด ์ˆ˜๋ฅผ ์ค„์ด๋ฉด GC ๋น„์šฉ์„ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ๋‹ค. ๋‘๋ฒˆ์งธ๋Š” ๊ฐ์ฒด๋ฅผ ์ง๋ ฌํ™”ํ•ด์„œ ์ €์žฅํ•˜๋ฉด RDD ํŒŒํ‹ฐ์…˜ ๋งˆ๋‹ค ๋ฐ”์ดํŠธ ๋ฐฐ์—ด ๊ฐ์ฒด ํ•˜๋‚˜๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

GC ๊ฐ€ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š”์ง€ ๋””๋ฒ„๊น…ํ•˜๋ ค๋ฉด Spark ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‹คํ–‰ํ•  ๋•Œ Java ์˜ต์…˜์— -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamp ์„ค์ •ํ•˜๋ฉด GC๊ฐ€ ๋ฐœ์ƒํ•  ๋•Œ ๋งˆ๋‹ค ๋กœ๊ทธ๋ฅผ ์ถœ๋ ฅํ•  ์ˆ˜ ์žˆ๋‹ค.

GC ๋””๋ฒ„๊น… ์˜ต์…˜

spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0โ€“2041 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
spark.executor.extraJavaOptions -Dhdp.version=2.2.0.0โ€“2041 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

GC ๋ฌธ์ œ๋ฅผ ์ ๊ฒ€ํ•˜๊ณ  ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

  1. Full GC ๊ฐ€ ๋ฐฐ์น˜ ์ž‘์—… ์™„๋ฃŒ ์ „์— ์—ฌ๋Ÿฌ๋ฒˆ ์ผ์–ด๋‚œ๋‹ค๋ฉด ์ž‘์—…์„ ์‹คํ–‰ํ•  ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋ถ€์กฑํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

  2. Heap์— OldGen ์ด ๊ฑฐ์˜ ๊ฝ‰ ์ฐผ๋‹ค๋ฉด ์บ์‰ฌ ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ๋ฅผ ์ค„์—ฌ๋ผ. ์ž‘์—…์ด ๋Š๋ ค์ง€๋Š” ๊ฒƒ๋ณด๋‹ค ์บ์‰ฌ๋ฅผ ๋” ์ ๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ๋” ์œ ๋ฆฌํ•˜๋‹ค.

์ฐธ๊ณ ์ž๋ฃŒ

http://spark.apache.org/docs/latest/tuning.html https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/SparkConfSuite.scala https://ogirardot.wordpress.com/2015/01/09/changing-sparks-default-java-serialization-to-kryo/ http://helloworld.naver.com/helloworld/textyle/1329

Last updated

Was this helpful?