- In today's data-driven world, cloud providers are essential for efficiently managing, processing, and analyzing vast amounts of data. When choosing one such provider, Google Cloud Platform (GCP) stands out as a leading choice.
Among its services, BigQuery is one of the most important, offering powerful data warehousing and analytics solutions. - As businesses increasingly depend on cloud services, it becomes crucial to monitor and manage costs effectively.
- BigQuery, while immensely valuable, can represent a substantial part of the overall expenses on the GCP.
- By closely monitoring cloud costs, particularly those associated with BigQuery, organizations can optimize resource usage, ensure they are getting the best return on their investment, and maintain financial control.
- The costs are divided between compute and storage. Here we will only write about storage.
- In this article, we will provide a hands-on solution to monitor and reduce GCP costs. This allows you to fully harness the advantages of cloud computing while keeping expenditures in check.
Understanding BigQuery Storage
Long-term Data vs. Active Data
The first thing to know is that storage pricing in BigQuery involves costs for both active and long-term storage.
- Active storage includes any table or table partition modified in the last 90 days.
- Long-term storage includes any table or table partition that has not been modified for 90 consecutive days.
After 90 days of inactivity, the storage cost for a table automatically drops by approximately 50%. There is no difference in performance, durability, or availability between active and long-term storage.
The switch is automatic, and you don’t have to do anything, but it is a good thing to know this to well organize your data warehouse and monitor your costs.
Data Retention: Time-Travel and Fail-Safe
BigQuery offers advanced data retention features such as Time-Travel and Fail-Safe to ensure data recovery and integrity.
Time-Travel:
- Allows querying updated or deleted data within the past seven days by default.
- Helps restore deleted or expired tables.
- Configurable time window from 2 to 7 days to balance cost and data retention needs.
Fail-Safe:
- Provides an additional seven-day recovery period after Time-Travel.
- Ensures emergency data recovery without direct access for querying (you have to pass through the Cloud Customer Care service)
- Not configurable
Command to configure time-travel window to 2 days :
Compression: Logical Bytes vs. Physical Bytes
BigQuery uses efficient compression techniques, which reduce the amount of space needed to store data by encoding it more compactly. Understanding the difference between logical bytes (uncompressed data size) and physical bytes (compressed data size) is essential for cost management.
Physical bytes are lower in size compared to logical bytes due to compression efficiency.
Here, BigQuery achieves a compression ratio of 17. We'd like to pay for the amount of data stored physically, rather than logically. And we can ! However, there are some nuances to consider.
BigQuery Storage Billing System
PBSBilling vs LBSBilling
The introduction of Physical Bytes Storage Billing (PBSB) in July 2023 offered a new approach to managing BigQuery storage costs.
Logical Bytes Storage Billing (LBSB):
- Charges based on uncompressed data size.
- Includes costs for Time-Travel and Fail-Safe storage (you do not pay for them)
Physical Bytes Storage Billing (PBSB):
- Charges based on compressed data size.
- Twice as expensive as LBSB
These billing systems are at dataset level.
Pricing comparison summary table :
Computing Compression Ratio per dataset
Now, we begin to understand why evaluating the compression ratio of your datasets is crucial to determine the potential savings.
There are two things to consider :
- for logical bytes, only consider the tables that are not deleted (the table INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT also shows you the tables that have been deleted)
- only consider the tables of type BASE TABLE, as you won’t be charged for other types of table (external tables, table clones, table snapshots, views and materialized views)
Here, we have compression ratios up to 35. Even if we're paying twice as much for physical storage, such a compression ratio does seem to give us an important financial advantage.
Evaluating Potential Savings
Now that we know all this, we can determine concretely how much we can save per month and dataset if we decide to change our billing method. The following query performs the calculation. If the forecast_total_cost_difference
column contains a positive value, don't hesitate to change your billing system on the corresponding dataset.
If we decide to switch all these datasets to PBSB, we'll save approximately $900 a month!
Switching from One Billing System to Another
Making the Switch
To update the storage billing system of one of your datasets, here is the simple command to use :
There are 3 things to consider when you use this command :
- Ensure no active flat-rate commitments in the same region before launching the command
- You’ll have to wait 24 hours for the change to take effect.
- Once the change is made, you’ll have to wait at least 14 days before being able to make the reverse change.
Automating the Identification of Cost-Saving Opportunities
Manually running the SQL query across your datasets can be time-consuming, especially if you have a large number of datasets. To streamline this process, you can automate the execution of the final query across all relevant datasets in your BigQuery environment.
One possible way to do this is :
- create a Schedule Query in BQ using the query given above in the article ;
- store the results in a table that logs the cost difference for each dataset over time ;
- use Google Cloud Monitoring to create an alert that triggers when the
forecast_total_cost_difference
column shows a positive value for any dataset (or a negative one, it can sometimes be profitable to update back the billing system to logical !) ; - make the switch.
Conclusion
Optimizing cloud costs is an ongoing process that requires careful monitoring and timely decision-making. By understanding the different Storage Billing systems, you can make informed choices that align with your organization's financial goals. The provided SQL queries and automation strategies empower you to take control of your BigQuery storage costs, ensuring that you are paying the most efficient price for your data, without compromising on performance or data integrity.
If you would like more tips on how to optimize your use of GCP, don't hesitate to contact our team of experts.
You can also follow this guide to quickly become Google-certified yourself.