Case Studies

Optimizing SharpGrid’s infrastructure for global expansion

Written by Jana Brnakova | May 30, 2024

SharpGrid, a Prague-based startup, is on a mission to become the global leader in market research for the HoReCa (hotels, restaurants, cafés) sector. They collect extensive data on businesses, including types, locations, sales volumes, and pricing.

SharpGrid expanded to 41 countries in Europe within a year and aims to continue expanding to other continents, requiring a scalable and cost-effective infrastructure.

Revolgy took over conducting a comprehensive infrastructure review and billing analysis for SharpGrid. Spread across two months, the objective was to identify inefficiencies and recommend optimizations to support SharpGrid’s ambitious expansion plans.

Finding more optimization opportunities than anticipated

SharpGrid’s infrastructure included a complex data platform with components such as GKE Cluster, Cloud SQL, BigQuery, Dataproc, and machine learning models. We needed to conduct an in-depth analysis to ensure optimal performance and cost-effectiveness.

Our analysis revealed more extensive data processing and machine learning optimization opportunities than anticipated. This allowed us to provide SharpGrid with highly focused recommendations, exceeding their expectations, and set up a robust infrastructure ready for expansion.

“The project provided us with an in-depth analysis of the security of our deployment, which helped us identify and improve key areas, significantly strengthening our overall security. At the same time, the recommended cloud cost optimizations allowed us to rethink our current solution and better control expenses in the long term.”

— Michael Chrzanowski, CTO, SharpGrid

We approached the project with a detailed analysis of SharpGrid’s infrastructure.

  • Infrastructure and billing analysis: We thoroughly reviewed each resource’s configuration, focusing on networking, BigQuery, compute, Kubernetes, CloudSQL, GKE, Dataproc, and machine learning.
  • Data platform optimization: We delved into the specifics of Dataproc and machine learning models, identifying underutilized clusters and recommending optimizations such as:
    • Enhanced labeling and cluster utilization monitoring
    • Running low-volume tasks during off-peak hours
    • Using efficient Python packages for data ingestion
  • Security enhancements: We proposed a segmented network model using multiple VPCs connected through VPC peering to ensure secure access control and environment separation. Implementing a foundation model for security ensured standardized infrastructure management, enhancing overall security and resilience.
  • Cost savings: We identified significant savings opportunities through Google’s Committed Use Discounts (CUDs) for Compute Engine and Cloud SQL services, reducing instance prices for 1-year and 3-year commitments.

“SharpGrid has big plans in market research for HoReCa, and so we focused on making their Google Cloud setup more efficient, secure, and cost-effective to help them expand smoothly.”

— Michal Režnický, Head of Professional Services, Revolgy

Our recommendations for optimization

We discovered that the majority of the monthly costs come from Compute Engine and Cloud SQL, mainly storage and IP address reservation. Here, we look at some of the optimization opportunities we recommend for SharpGrid.

A high potential for saving presents the removal of public IP addresses, which are not needed to run SharpGrid’s infrastructure. Plus, to some degree, they also pose a security threat.

As an immediate action, we recommended that the client merge two separate GKE clusters into one to run the same Airflow instance. This would reduce the overhead of managing multiple clusters.

The costs associated with the Cloud SQL storage account for 9% of the monthly cloud bill. This means that SharpGrid is paying for the SSD storage allocated to a Cloud SQL instance, even if the instance is shut down. They do not pay for what they use, but they pay for the full capacity. We suggest using BigQuery as an alternative.

We also recommend adding labels in Dataproc clusters to facilitate identifying the pipelines that incur the highest expenses. This would also help prioritize which pipelines should be optimized. The Compute Engine currently accounts for 40% of the monthly cloud cost. 

During our review, we also discovered that memory and CPU resources are being underutilized, which means that they are sitting idle.

About SharpGrid

SharpGrid is a startup that collects and analyzes data on businesses within the HoReCa sector. They began in the Czech Republic, spinning out of the analytics company BizMachine, and have expanded to 41 European countries over the past three years.

SharpGrid provides critical market insights to major global companies like Heineken, Danone, and Unilever, helping them identify new business opportunities and stay ahead of the competition. Now, with plans to expand to other continents, SharpGrid aims to become the global leader in HoReCa market research.

By addressing inefficiencies, enhancing security, and implementing cost-saving measures, Revolgy has provided SharpGrid with a robust and scalable infrastructure. This strong foundation will support their growth ambitions, enabling them to maintain their competitive edge and achieve their goal of becoming the global leader in market research for the HoReCa sector.

Do you want to learn more about Revolgy and our services? Contact us for a free consultation with one of our experts on cloud solutions.