Think back to a time you had to install a large amount of new equipment in your data center. What did you do?
Likely, your team calculated how much power and space the new equipment will need. From there, your team found a location to accommodate the new equipment, which may have required you to re-engineer the space somewhat (or a lot). You probably checked the power redundancy for the given location and maybe did a few spot checks for levels of cooling available.
Even when you take the necessary steps, things can still go wrong and, as a data center professional, some amount of troubleshooting new equipment is a given. However, how quickly can you determine the issue and how quickly can you fix it? Better yet, can you predict issues before they happen to avoid them altogether?
Recently, Roblox, an online game platform and game creation powerhouse, experienced a three-day outage across its platform. This outage combined several factors, including a core system in their infrastructure becoming overwhelmed, a backend service communications bug, and a growth in the number of servers in their data centers. A three-day long outage, especially during a period where Roblox would typically take in at least $15 million in revenue, is significant.
This perfect storm can happen to any organization looking to scale. In this post, I want to explore how you can troubleshoot data center issues with a physics-based, data driven approach. Now, I’d like to point out that we can’t help with a bug in the backend communications code, but we can ensure that the data center environment is tuned for optimum performance and is truly resilient to cooling and power system failures.
Of course, the first thing you do when things go wrong is determine the issue and take the necessary steps to remedy said issue. But finding the issue can be difficult sometimes, as it was for Roblox.
Two words: airflow analysis. Airflow analysis opens up options to analyze cooling performance and explore IT deployment options more thoroughly. Once you know how air is moving through your data center, you can bring cooling delivery paths back to a safely manageable state. From there, you can lower maximum equipment temperatures through alterations to internal rack configurations or more large-scale changes to the cooling infrastructure.
How do you perform airflow analysis? Easy, by using our 6Sigma Digital Twin, which combines Computational Fluid Dynamics (CFD) and digital twin technology.
Predicting and seeing airflow is pivotal to maintaining data center uptime and remedying potential issues. Standard CFD simulations are a snapshot for a single point in time. But data centers are not static, especially during a failure. To truly model a cooling or infrastructure failure, time must be considered, so it is paramount to account for all time varying phenomena that occur during a failure.
The 6Sigma Digital Twin, with the right data and metrics about your data center embedded, does this for you. To see this in action, check out our video on running a transient cooling failure analysis.
Having visibility into airflow makes it easier to assess various failure scenarios and ultimately create a more resilient data center. We’d love to discuss how to help you safely scale your data center. Get in contact with a member of our team here.
Blog written by: Mark Fenton, Product Manager
Other Recent Posts
Embracing Change in the Data Center Industry [eBook]
Earlier this year, Future Facilities and Bisnow launched Data Center Simulations & Predictive Techn…
Enhance Colocation Experience with Digital Twins
With the adoption of 5G, edge computing, and the Internet of Things (IoT), businesses are increasin…
10 November, 2021