Structured Streaming

Structured Streaming ์€ ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ์„ ํ…Œ์ด๋ธ”๋กœ ๊ด€๋ฆฌํ•œ๋‹ค. ๋งค ๋ฏธ๋‹ˆ๋งค์น˜ ๋งˆ๋‹ค ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ŠคํŠธ๋ฆผ ์œ ์ž…๋˜๋ฉด ํ…Œ์ด๋ธ”์— ์ถ”๊ฐ€๋œ๋‹ค.

Basic Concepts

์ถœ์ฒ˜: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
์ถœ์ฒ˜: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

Types of time windows

์ถœ์ฒ˜: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html

Handling Late Data and Watermarking

์ถœ์ฒ˜: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
import spark.implicits._

val words = ... // streaming DataFrame of schema { timestamp: Timestamp, word: String }

// Group the data by window and word and compute the count of each group
val windowedCounts = words
    .withWatermark("timestamp", "10 minutes")
    .groupBy(
        window($"timestamp", "10 minutes", "5 minutes"),
        $"word")
    .count()

์ฐธ๊ณ ์ž๋ฃŒ

Spark Documentation, https://docs.microsoft.com/ko-kr/azure/databricks/getting-started/spark/streaming

Last updated

Was this helpful?