Time Series Analysis

Prophet

Prophet์„ ์ด์šฉํ•œ ์‹œ๊ณ„์—ด ๋ถ„์„ ๋ฐ ์˜ˆ์ธก

์‹œ๊ณ„์—ด ๋ถ„์„์€ ์ƒ์‚ฐ๊ด€๋ฆฌ๋‚˜ ์ˆ˜์š”์˜ˆ์ธก ๋“ฑ ์‹ค๋ฌด์—์„œ ๋งŽ์ด ํ•„์š”๋กœ ํ•˜์ง€๋งŒ ํ’ˆ์งˆ์ด ๊ทธ๋‹ค์ง€ ์ข‹์ง€ ์•Š๋Š” ํŽธ์ด๋‹ค. ์ œํ’ˆ ๋ฐ ์„œ๋น„์Šค์— ๋Œ€ํ•œ ์ˆ˜์š”๋ฅผ ๋” ์ž˜ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ์‹œ๊ณ„์—ด ๋ถ„์„์˜ ์†๋„์™€ ์ •ํ™•์„ฑ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์ด ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ œํ’ˆ์ด ๋งค์žฅ์— ์ ๊ฒŒ ๋ฐฐ์น˜๋˜๋ฉด ๊ณ ๊ฐ์€ ํ•„์š”ํ•  ๋•Œ ์ œํ’ˆ์„ ๊ตฌ๋งคํ•  ์ˆ˜ ์—†์–ด์„œ ์†Œ๋งค์—…์ฒด์˜ ์ˆ˜์ต ์†์‹ค์„ ์ดˆ๋ž˜ํ•˜๊ณ  ์†Œ๋น„์ž๋“ค์€ ๋ถˆ๋งŒ์œผ๋กœ ๊ณ ๊ฐ์ด ๊ฒฝ์Ÿ์—…์ฒด๋กœ ์ด๋™ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ •ํ™•ํ•˜๊ณ  ์‹œ๊ธฐ์ ์ ˆํ•œ ์˜ˆ์ธก์ด ์ค‘์š”ํ•˜๋‹ค.

Prophet

Prophet์€ ํŽ˜์ด์Šค๋ถ์—์„œ ๊ณต๊ฐœํ•œ ์‹œ๊ณ„์—ด ๋ถ„์„ ๋ฐ ์˜ˆ์ธก ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ฉฐ, ์ตœ๋Œ€ํ•œ ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์ด ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ๊ณ„์—ด ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋‹ค. ์ •ํ™•๋„๊ฐ€ ๋†’๊ณ  ์ง๊ด€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•ด ๋ชจ๋ธ์„ ํŠœ๋‹ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ณ„์ ˆ์„ฑ์ด ์žˆ๊ณ  ์—ฌ๋Ÿฌ ์‹œ์ฆŒ์˜ ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ์‹œ๊ณ„์—ด์—์„œ ์ž˜ ๋™์ž‘ํ•˜๋ฉฐ, KPI ์˜ˆ์ธก์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

Prophet์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋ฐ์ดํ„ฐ์— ํŠนํžˆ ์œ ์šฉํ•˜๋‹ค.

์‹œ๊ฐ„์˜ ํ๋ฆ„์— ๋”ฐ๋ผ ๊ด€์ธก๋œ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋กœ ์—ฐ ๋‹จ์œ„ ์ด์ƒ์ด๋ฉด ์ข‹๋‹ค. ๊ณ„์ ˆ์  ํŠน์„ฑ์ด ์กด์žฌํ•œ๋‹ค. ๋ถˆ๊ทœ์น™์ ์œผ๋กœ ์ผ์–ด๋‚˜์ง€๋งŒ ์‚ฌ์ „์— ์‹œ์ ์„ ์•Œ๊ณ  ์žˆ๋Š” ์ด๋ฒคํŠธ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. ์˜ˆ๋ฅผ๋“ค์–ด, ๋ธ”๋ž™ ํ”„๋ผ์ด๋ฐ์ด๋Š” ํŠน์ •ํ•œ ์ด๋ฒคํŠธ๋กœ ์ธํ•ด ์žฅ๊ธฐ์ ์ธ ์ถ”์ด๊ฐ€ ๋ณ€ํ•  ์ˆ˜ ์žˆ๋‹ค. ์‹ ์ œํ’ˆ ์ถœ์‹œ, ๋””์ž์ธ ๋ณ€๊ฒฝ์€ ๋น„์„ ํ˜• ์„ฑ์žฅ ์ถ”์„ธ์ด๋‹ค. ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ง€ํ‘œ์˜ ์ตœ๋Œ€์น˜๊ฐ€ ์กด์žฌํ•˜๊ณ  ์ด๋ฅผ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ์ด๋‹ค. ๊ฒฐ์ธก์น˜๊ฐ€ ์กด์žฌํ•˜๊ฑฐ๋‚˜ ์ด์ƒ์น˜๊ฐ€ ๋งŽ๋‹ค.

Prophet์˜ ์ฃผ์š” ๊ตฌ์„ฑ์š”์†Œ๋Š” Growth, Seasonality, Holidays(Events), Error Term ์ด๋ฉฐ, ๊ณต์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

Growth: g(t)

Growth๋Š” ์‹œ๊ฐ„์— ๋”ฐ๋ผ ์ถ”์„ธ(Trend)๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, Linear Growth (Change Point)์™€ Non-Linear Growth (Logistic Growth), Flat Growth ๊ฐ€ ์žˆ๋‹ค.

Logistic Growth Model

Logistic Growth model์€ ์ž์—ฐ์ ์ธ ์ƒํ•œ์„ ์ด ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ ์‚ฌ์šฉํ•œ๋‹ค.

C (Capacity) ์ƒํ•œ์„ , k ๋Š” growth rate๋กœ ๊ณก์„ ์˜ ๊ธฐ์šธ๊ธฐ, m์ด offset parameter ์ด๋‹ค.

์ง€์ˆ˜ ๋ชจ๋ธ์€ ๋‹จ๊ธฐ์ ์œผ๋กœ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์„ฑ์žฅ์ด ์˜์›ํžˆ ์ง€์†๋  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ์˜ค๋ž˜ ์ง€์†๋˜๋ฉด ๋ฌด๋„ˆ์ง€๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค. Capacity์™€ k(์„ฑ์žฅ๋ฅ )๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๋ณ€ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

Linear Growth Model (Piecewise)

์ผ์ •ํ•œ ์„ฑ์žฅ๋ฅ ์„ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ Linear Growth Model์„ ์‚ฌ์šฉํ•œ๋‹ค. Change point๋Š” ์ž๋™์œผ๋กœ ํƒ์ง€๋˜๋ฉฐ, ์˜ˆ์ธกํ•  ๋•Œ ํŠน์ • ์ง€์ ์ด Change point์ธ์ง€ ์—ฌ๋ถ€๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ๊ฒฐ์ •ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  Change point ๋Š” ์‹ ์ œํ’ˆ ์ถœ์‹œ, ๋””์ž์ธ ๋ณ€๊ฒฝ๊ณผ ๊ฐ™์€ ํŠน์ •ํ•œ ์ด๋ฒคํŠธ๋กœ ์ธํ•ด ์ถ”์„ธ๊ฐ€ ๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์ ์ด๋ฉฐ, ์‚ฌ์šฉ์ž๊ฐ€ change point๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

Seasonality: s(t)

์‹œ๊ฐ„์— ๋”ฐ๋ผ ์ฃผ๊ธฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ํŒจํ„ด์ด๋‹ค. ํŒจํ„ด์€ Trend์— ๋”ฐ๋ผ ์ง„ํญ์ด ์ผ์ •, ์ฆ๊ฐ€ ํ˜น์€ ๊ฐ์†Œํ•  ์ˆ˜ ์žˆ๋‹ค. ํ‘ธ๋ฆฌ์— ๊ธ‰์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ ํŒจํ„ด์˜ ๊ทผ์‚ฌ์น˜๋ฅผ ์ฐพ๋Š”๋ฐ, ํ‘ธ๋ฆฌ์— ๊ธ‰์ˆ˜๋Š” ์ž„์˜์˜ ์ฃผ๊ธฐํ•จ์ˆ˜๋ฅผ ์‚ผ๊ฐํ•จ์ˆ˜์˜ ํ•ฉ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

P๋Š” ์ฃผ๊ธฐ T๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ ์—ฐ๋‹จ์œ„๋ฉด 365.25, ์ฃผ ๋‹จ์œ„๋ฉด 7์ผ์ด๋‹ค. N ์€ ์‚ผ๊ฐํ•จ์ˆ˜๋ฅผ ์–ผ๋งˆ๋‚˜ ์„ž๋Š”๋ƒ๋ฅผ ์˜๋ฏธํ•˜๋ฉฐ, ์—ฐ๋„๊ธฐ์ค€์ด๋ฉด N=10, ์ฃผ๋‹จ์œ„๋ฉด N=3 ์ด ์ž˜ ๋“ค์–ด๋งž๋Š”๋‹ค๊ณ  ํ•œ๋‹ค. N์ด ํฌ๋ฉด ํŒจํ…ฌ์ด ๋น ๋ฅด๊ฒŒ ๋ฐ”๋€Œ๊ฒŒ ๋˜๊ณ , N์ด ์ž‘์œผ๋ฉด ๋А๋ฆฌ๊ฒŒ ๋ณ€ํ•œ๋‹ค.

Holidays: h(t)

์ฃผ๊ธฐ์„ฑ์„ ๊ฐ€์ง€์ง€๋Š” ์•Š์ง€๋งŒ(๋ถˆ๊ทœ์น™ํ•˜์ง€๋งŒ) ์ „์ฒด ์ถ”์ด์— ํฐ ์˜ํ–ฅ์„ ์ฃผ๋Š” ์ด๋ฒคํŠธ์ด๋‹ค. ์ด๋ฒคํŠธ ์•ž๋’ค๋กœ window ๋ฒ”์œ„๋ฅผ ์ง€์ •ํ•˜์—ฌ ํ•ด๋‹น ์ด๋ฒคํŠธ๊ฐ€ ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์˜ ๋ฒ”์œ„๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ธ”๋ž™ํ”„๋ผ์ด๋ฐ์ด๋‚˜ ์Œ๋ ฅ ์ด๋ฒคํŠธ๋Š” ๊ธฐ์—…์˜ ์ƒ์‚ฐ๊ด€๋ฆฌ๋‚˜ ์ด์œค์ฐฝ์ถœ์— ์˜ํ–ฅ์„ ์ฃผ์ง€๋งŒ ๋‚ ์งœ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๋‹ค.

Error Term (ฯต) ๋ฐ์ดํ„ฐ์— ์žก์Œ์ด ๋“ค์–ด๊ฐ€๊ฑฐ๋‚˜ ์œ ์‹ค๋˜๋Š” ๋“ฑ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ์ด์œ ๋กœ ์™„๋ฒฝํ•œ ํšŒ๊ท€์‹์„ ๋งŒ๋“ค ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์— ํšŒ๊ท€์‹์— ์ •๊ทœ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋Š” ์ž”์ฐจ๋ฅผ ๋‘”๋‹ค.

๋ชจ๋ธ ํ›ˆ๋ จ

๋ฐฑ์—”๋“œ๋กœ ํ†ต๊ณ„ ๋ชจ๋ธ๋ง ๋„๊ตฌ Stan์„ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ๋‹ค. Prophet์€ ์ •ํ™•๋„๊ฐ€ ๋†’๊ณ  ์ง๊ด€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•ด ๋ชจ๋ธ์„ ํŠœ๋‹ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ชจ๋ธ์„ ์ ํ•ฉํ•˜๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ณด๊ธฐ ์œ„, fit() ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด

y: history
seasonality_prior_scale: prior_scale
holidays_prior_scale: prior_scale
seasonality_mode: mode
period: period

๋“ฑ์˜ ๋ณ€์ˆ˜์™€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ make_all_seasonality_features() ํ•จ์ˆ˜๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด make_seasonality_features()์—์„œ fourier_series() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด ๊ณ„์ ˆ ํŒจํ„ด์ธ seasonal_feature๋ฅผ ๊ตฌํ•˜๊ณ  seasonality_prior_scale ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด prior_scales ๊ตฌํ•˜๊ณ  make_holiday_features() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด holiday_feature์™€ holidays_prior_scale ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด prior_scales๋ฅผ ๊ตฌํ•ด seasonal_features ์— ์ถ”๊ฐ€ํ•œ๋‹ค.

seasonal_features: X,
prior_scales: sigmas,
changepoint_prior_scale: tau,
changepoints_t: t_change,
ds -> t: t
additive_terms: s_a
multiplicative_terms: s_m

๋“ฑ์œผ๋กœ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ฐ ๋ณ€์ˆ˜๋ฅผ ์ •์˜ํ•˜๊ณ  stan์— ์ „๋‹ฌํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ๋‹ค. ๋ชจ๋ธ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‰ดํ„ด๋ฐฉ๋ฒ•์ธ Newton ํ˜น์€ LBFGS๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

transformed parameters {
   vector[T] trend;
   if (trend_indicator == 0) {
      trend = linear_trend(k, m, delta, t, A, t_change);
   } else if (trend_indicator == 1) {
      trend = logistic_trend(k, m, delta, t, cap, A, t_change, S);
   } else if (trend_indicator == 2) {
      trend = flat_trend(m, T);
   }
}

model {
   //priors
   k ~ normal(0, 5);
   m ~ normal(0, 5);
   delta ~ double_exponential(0, tau);
   sigma_obs ~ normal(0, 0.5);
   beta ~ normal(0, sigmas);

   // Likelihood
   y ~ normal(
   trend
   .* (1 + X * (beta .* s_m))
   + X * (beta .* s_a),
   sigma_obs
   );
}

Capacities ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์ „์ฒด์˜ ์ตœ๋Œ€๊ฐ’์ด๋‹ค. ์˜ˆ) ์‹œ์žฅ ์ด ์ˆ˜์š”

Change points ์ถ”์„ธ๊ฐ€ ๋ณ€ํ™”ํ•˜๋Š” ์‹œ์ ์ด๋‹ค. ์˜ˆ) ์ƒํ’ˆ์ด ๋ฐ”๋€Œ๊ฑฐ๋‚˜ ์‹ ์ œํ’ˆ์ด ์ถœ์‹œ๋  ๋•Œ

Holidays & Seasonality ์ถ”์„ธ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์‹œ๊ธฐ์  ์š”์ธ๋“ค์ด๋‹ค. ์˜ˆ) ํŒ๋งค๋Ÿ‰์— ์˜ํ–ฅ์„ ๋งŽ์ด ๋ฏธ์น˜๋Š” ํœด์ผ ๋“ฑ์ด๋‹ค.

Smoothing Parameter ๊ฐ๊ฐ์˜ ์š”์†Œ๋“ค์ด ์ „์ฒด ์ถ”์ด์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์˜ ์ •๋„์ด๋‹ค. ์˜ˆ) ์ฃผ๊ธฐ๋งˆ๋‹ค ๋ณ€๋™์„ ์–ผ๋งˆ๋‚˜ ๋‚˜ํƒ€๋‚ด์•ผ ํ•˜๋Š”์ง€์ด๋‹ค.

๋ชจ๋ธ ํ‰๊ฐ€

T ๊นŒ์ง€ ์ž๋ฃŒ๊ฐ€ ์žˆ๊ณ  ์ดํ›„ h๋ฅผ ๋” ์˜ˆ์ธกํ•˜๋Š”๋ฐ ๊ทธ ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์˜ค์ฐจ์˜ MAPE(Mean Absolute Percentage Error)๋ฅผ ํ‰๊ฐ€์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

๋ชจ๋ธ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์‹œ๊ณ„์—ด์ด๊ณ  ์„ž์„ ์ˆ˜๊ฐ€ ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€์ถฉ ์ „์ฒด ๊ธฐ๊ฐ„์˜ ์ ˆ๋ฐ˜ ์ •๋„๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์œˆ๋„์šฐ๋ฅผ ์žก๊ณ  ํ‰๊ฐ€ํ•œ๋‹ค. ์œˆ๋„์šฐ๊ฐ€ ํฌ๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค ๋น„์Šทํ•˜๊ณ  ์ž‘์œผ๋ฉด ์—๋Ÿฌ๊ฐ€ ์ปค์งˆ ์ˆ˜ ์žˆ๋‹ค.

์—๋Ÿฌ์œจ ์ œ์ผ ๋‚ฎ์•„์•ผ ์ข‹์€ ๋ชจ๋ธ์ด๋‹ค.

๋ชจ๋ธ ํŠœ๋‹ ๋ฐฉ๋ฒ•์€ Baseline ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•ด ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง€๋ฉด trend, seansonality ๋“ฑ์„ ์ˆ˜์ •ํ•˜๊ณ , ํŠน์ • ์ผ์ž์— ์˜ˆ์ธก๋ฅ ์ด ๋–จ์–ด์ง€๋ฉด, ์ด์ƒ์น˜๋ฅผ ์ œ๊ฑฐํ•˜๊ณ , ํŠน์ • cutoff (์—ฐ๋ง ๋“ฑ)์— ์˜ˆ์ธก๋ฅ ์ด ๋–จ์–ด์ง€๋ฉด changepoint๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ํŠœ๋‹ ์ž‘์—…์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฐธ๊ณ ์ž๋ฃŒ

https://github.com/facebook/prophet/blob/master/python/prophet/forecaster.py https://databricks.com/blog/2020/01/27/time-series-forecasting-prophet-spark.html https://databricks.com/blog/2021/04/06/fine-grained-time-series-forecasting-at-scale-with- https://www.slideshare.net/lumiamitie/facebook-prophet https://brunch.co.kr/@gimmesilver/17 https://m-insideout.tistory.com/m/13 https://peerj.com/preprints/3190.pdf https://github.com/facebook/prophet/blob/master/python/prophet/forecaster.py#L1136 ์ŠคํŒŒํฌ๋ฅผ ์ด์šฉํ•œ ์‹œ๊ณ„์—ด ์˜ˆ์ธก, https://towardsdatascience.com/pyspark-forecasting-with-pandas-udf-and-fb-prophet-e9d70f86d802

Last updated

Was this helpful?