In times like this finOps suddenly becomes a priority and you want to capture sudden growth in cost early on and not wait to the end of the month. Google Cloud Platform offers budget alerts that can be set at different levels and based on actual or forecasted amount. That is good, but change of spending patterns can hide in the totals and it can take days or weeks until you are alerted. Hence, me and my awesome colleague Johan Gunnarsson wanted to try out a simple idea during one of our hackathons - combine data from GCP billing exports in BigQuery with BigQueryML and DBT tests to detect cost anomalies and it turned out to be really good and cheap.
Walkthrough
If you haven’t already done it, first enable cloud billing export to BigQuery.
Create a DBT model where you aggregate cost on project, service and day. We limited the time horizon to 90 days as it becomes a bit slow otherwise and the the history is enough for pretty good results.
gcp_billing_project_service.sql
{{ config(
materialized = 'view',
) }}
WITH source AS (
SELECT *
FROM {{ source('admin', 'gcp_billing_export_v1') }}
WHERE _PARTITIONTIME > TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY))
)
SELECT
CONCAT(project.name, "__", service.description) AS project_service,
DATE(usage_start_time) AS usage_date,
SUM(cost) AS cost_eur
FROM source
GROUP BY 1, 2
Then a DBT model to train the BigQueryML model that uses ARIMA_PLUS to detect cost anomalies on a project and service level, we use the Swedish holiday region.
gcp_billing_cost_model.sql
{{
config(
materialized='model',
ml_config={
'model_type': 'ARIMA_PLUS',
'TIME_SERIES_TIMESTAMP_COL':'usage_date',
'TIME_SERIES_DATA_COL':'cost_eur',
'TIME_SERIES_ID_COL':'project_service',
'HOLIDAY_REGION':'SE'
}
)
}}
SELECT *
FROM {{ ref('gcp_billing_project_service') }}
WHERE usage_date < current_date() - 1
We then use the model in the dbt_ml macro with an anomaly probability threshold of 0.98 together with some logic that we only want anomalies with actuals higher than the upper_bound (increase in cost) and more than €30 (we don’t care about the smaller and more variable costs).
gcp_billing_anomaly_detect.sql
{{
config(
materialized='table'
)
}}
WITH eval_data AS (
SELECT *
FROM {{ ref("gcp_billing_project_service") }}
WHERE usage_date = current_date() - 1
)
SELECT * FROM {{ dbt_ml.detect_anomalies(ref('gcp_billing_cost_model'), 'eval_data', 0.98) }}
WHERE
cost_eur > upper_bound
AND cost_eur > 30
AND is_anomaly IS TRUE
We then set up an ordinary DBT test that is triggered if above model return more than 0 rows.
gcp_billing.yml
version: 2
models:
- name: gcp_billing_anomaly_detect
description: 'GCP cost anomalies for today'
tests:
- dbt_expectations.expect_table_row_count_to_equal:
value: 0
Unfortunately I don’t have any example of how the cost anomaly looks like when reported in Elementary, but below is an illustration the of records that the test is based on (scrubbed and without the logic to filter out anomalies we really care about).
We hope this is useful for others to quickly spot sudden increase in cost in an easy and cost efficient way. Also, note the granularity is project, service and day, this can easily be adjusted to be more aggregated or more detailed level.
Hello, thank you for a good article! I try to understand the holiday effect for region SE. It seems like the underlaying holiday input table doesn't contain holiday for region SE ( select * from `bigquery-public-data`.`ml_datasets`.`holidays_and_events_for_forecasting`
where region = 'SE' gives no rows) but still it provides an effect in forecasts (for example new year) when comparing models with or without holiday SE. Did you experience the same?
Hi, I'm new to GCP, is there any step by step instructions available to implement this cost anomaly detection in GCP?