Vol. 4 No. 1 (2024): African Journal of Artificial Intelligence and Sustainable Development
Articles

Optimizing Control Plane Performance for Ultra-Scale EKS Clusters

Babulal Shaik
Cloud Solutions Architect at Amazon Web Services, USA
Srikanth Bandi
Software Engineer at JP Morgan chase, USA
Cover

Published 26-02-2024

Keywords

  • EKS,
  • control plane

How to Cite

[1]
Babulal Shaik and Srikanth Bandi, “Optimizing Control Plane Performance for Ultra-Scale EKS Clusters ”, African J. of Artificial Int. and Sust. Dev., vol. 4, no. 1, pp. 419–438, Feb. 2024, Accessed: Dec. 29, 2024. [Online]. Available: https://africansciencegroup.com/index.php/AJAISD/article/view/223

Abstract

Maintaining high performance and reliability is crucial to ensuring smooth operations in the realm of large-scale cloud infrastructure. Amazon Elastic Kubernetes Service (EKS), a managed Kubernetes platform, has gained popularity for running containerized applications at scale. However, as organizations grow and handle ultra-scale workloads, the performance of the EKS control plane becomes a critical concern. The control plane, responsible for managing the overall health and coordination of the Kubernetes cluster, can face challenges as the scale increases. Several strategies can be implemented to optimize the performance of the control plane in ultra-scale EKS clusters. First, architecture plays a vital role; choosing the correct configuration for the control plane and worker nodes & ensuring network efficiency is key. Additionally, resource allocation is essential to avoid bottlenecks. This involves careful management of computing, memory, and storage resources to ensure the control plane can handle high demands without slowing down. Monitoring also becomes increasingly important in ultra-scale environments, allowing teams to detect performance issues and make necessary real-time adjustments. Organizations can track control plane metrics such as API server latency, performance, & scheduling delays by leveraging the proper monitoring tools. Best practices are crucial for optimal performance, such as optimizing Kubernetes components like etcd, tuning API server settings, and using horizontal pod autoscaling. Furthermore, balancing efficiency with scalability is a challenge that must be addressed, as performance degradation at any point in the control plane could result in significant operational disruptions. As the cloud-native landscape continues to evolve, understanding the nuances of optimizing EKS control plane performance will be essential for businesses relying on containerized applications and Kubernetes orchestration.

Downloads

Download data is not yet available.

References

  1. Fraser, J., Haridas, A., Seetharaman, G., Rao, R. M., & Palaniappan, K. (2013, June). KOLAM: a cross-platform architecture for scalable visualization and tracking in wide-area imagery. In Geospatial InfoFusion III (Vol. 8747, pp. 144-160). SPIE.
  2. Bhaskaran, M. (1997). Synthesis and characterization of LPCVD SiC films using novel precursors. New Jersey Institute of Technology.
  3. Kontogiannis, S. G., & Ekaterinaris, J. A. (2013). Design, performance evaluation and optimization of a UAV. Aerospace science and technology, 29(1), 339-350.
  4. Peter, S., Li, J., Zhang, I., Ports, D. R., Woos, D., Krishnamurthy, A., ... & Roscoe, T. (2015). Arrakis: The operating system is the control plane. ACM Transactions on Computer Systems (TOCS), 33(4), 1-30.
  5. Koponen, T., Casado, M., Gude, N., Stribling, J., Poutievski, L., Zhu, M., ... & Shenker, S. (2010). Onix: A distributed control platform for large-scale production networks. In 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI 10).
  6. Heller, B., Sherwood, R., & McKeown, N. (2012). The controller placement problem. ACM SIGCOMM Computer Communication Review, 42(4), 473-478.
  7. Curtis, A. R., Mogul, J. C., Tourrilhes, J., Yalagandula, P., Sharma, P., & Banerjee, S. (2011, August). DevoFlow: Scaling flow management for high-performance networks. In Proceedings of the ACM SIGCOMM 2011 Conference (pp. 254-265).
  8. Gudipati, A., Perry, D., Li, L. E., & Katti, S. (2013, August). SoftRAN: Software defined radio access network. In Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking (pp. 25-30).
  9. Azodolmolky, S., Perelló, J., Angelou, M., Agraz, F., Velasco, L., Spadaro, S., ... & Tomkos, I. (2011). Experimental demonstration of an impairment aware network planning and operation tool for transparent/translucent optical networks. Journal of Lightwave Technology, 29(4), 439-448.
  10. Wu, J., Zhang, Z., Hong, Y., & Wen, Y. (2015). Cloud radio access network (C-RAN): a primer. IEEE network, 29(1), 35-41.
  11. Perrot, N., & Reynaud, T. (2016, March). Optimal placement of controllers in a resilient SDN architecture. In 2016 12th International Conference on the Design of Reliable Communication Networks (DRCN) (pp. 145-151). IEEE.
  12. Dixit, A., Hao, F., Mukherjee, S., Lakshman, T. V., & Kompella, R. (2013). Towards an elastic distributed SDN controller. ACM SIGCOMM computer communication review, 43(4), 7-12.
  13. Panda, S., & Padhy, N. P. (2008). Comparison of particle swarm optimization and genetic algorithm for FACTS-based controller design. Applied soft computing, 8(4), 1418-1427.
  14. Nunes, B. A. A., Mendonca, M., Nguyen, X. N., Obraczka, K., & Turletti, T. (2014). A survey of software-defined networking: Past, present, and future of programmable networks. IEEE Communications surveys & tutorials, 16(3), 1617-1634.
  15. Madhyastha, H. V., Isdal, T., Piatek, M., Dixon, C., Anderson, T., Krishnamurthy, A., & Venkataramani, A. (2006, November). iPlane: An information plane for distributed services. In Proceedings of the 7th symposium on Operating systems design and implementation (pp. 367-380).
  16. Immaneni, J. (2023). Best Practices for Merging DevOps and MLOps in Fintech. MZ Computing Journal, 4(2).
  17. Immaneni, J. (2023). Scalable, Secure Cloud Migration with Kubernetes for Financial Applications. MZ Computing Journal, 4(1).
  18. Nookala, G., Gade, K. R., Dulam, N., & Thumburu, S. K. R. (2023). Zero-Trust Security Frameworks: The Role of Data Encryption in Cloud Infrastructure. MZ Computing Journal, 4(1).
  19. Nookala, G. (2023). Real-Time Data Integration in Traditional Data Warehouses: A Comparative Analysis. Journal of Computational Innovation, 3(1).
  20. Komandla, V. Crafting a Clear Path: Utilizing Tools and Software for Effective Roadmap Visualization.
  21. Komandla, V. Enhancing Product Development through Continuous Feedback Integration “Vineela Komandla”.
  22. Thumburu, S. K. R. (2023). Mitigating Risk in EDI Projects: A Framework for Architects. Innovative Computer Sciences Journal, 9(1).
  23. Thumburu, S. K. R. (2023). The Future of EDI in Supply Chain: Trends and Predictions. Journal of Innovative Technologies, 6(1).
  24. Thumburu, S. K. R. (2022). The Impact of Cloud Migration on EDI Costs and Performance. Innovative Engineering Sciences Journal, 2(1).
  25. Gade, K. R. (2023). Data Lineage: Tracing Data's Journey from Source to Insight. MZ Computing Journal, 4(2).
  26. Gade, K. R. (2023). Security First, Speed Second: Mitigating Risks in Data Cloud Migration Projects. Innovative Engineering Sciences Journal, 3(1).
  27. Gade, K. R. (2022). Migrations: AWS Cloud Optimization Strategies to Reduce Costs and Improve Performance. MZ Computing Journal, 3(1).
  28. Katari, A. Case Studies of Data Mesh Adoption in Fintech: Lessons Learned-Present Case Studies of Financial Institutions.
  29. Katari, A. (2023). Security and Governance in Financial Data Lakes: Challenges and Solutions. Journal of Computational Innovation, 3(1).
  30. Nookala, G. (2021). Automated Data Warehouse Optimization Using Machine Learning Algorithms. Journal of Computational Innovation, 1(1).
  31. Muneer Ahmed Salamkar. Data Integration: AI-Driven Approaches to Streamline Data Integration from Various Sources. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 1, Mar. 2023, pp. 668-94
  32. Muneer Ahmed Salamkar, et al. Data Transformation and Enrichment: Utilizing ML to Automatically Transform and Enrich Data for Better Analytics. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, July 2023, pp. 613-38
  33. Muneer Ahmed Salamkar. Real-Time Analytics: Implementing ML Algorithms to Analyze Data Streams in Real-Time. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, Sept. 2023, pp. 587-12
  34. Naresh Dulam, et al. “Foundation Models: The New AI Paradigm for Big Data Analytics ”. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, Oct. 2023, pp. 639-64
  35. Naresh Dulam, et al. “Generative AI for Data Augmentation in Machine Learning”. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 2, Sept. 2023, pp. 665-88
  36. Naresh Dulam, and Karthik Allam. “Snowpark: Extending Snowflake’s Capabilities for Machine Learning”. African Journal of Artificial Intelligence and Sustainable Development, vol. 3, no. 2, Oct. 2023, pp. 484-06
  37. Sarbaree Mishra. “Incorporating Automated Machine Learning and Neural Architecture Searches to Build a Better Enterprise Search Engine”. African Journal of Artificial Intelligence and Sustainable Development, vol. 3, no. 2, Dec. 2023, pp. 507-2
  38. Sarbaree Mishra, et al. “Hyperfocused Customer Insights Based On Graph Analytics And Knowledge Graphs”. Journal of Artificial Intelligence Research and Applications, vol. 3, no. 2, Oct. 2023, pp. 1172-93
  39. Sarbaree Mishra, and Jeevan Manda. “Building a Scalable Enterprise Scale Data Mesh With Apache Snowflake and Iceberg”. Journal of AI-Assisted Scientific Discovery, vol. 3, no. 1, June 2023, pp. 695-16
  40. Babulal Shaik. Network Isolation Techniques in Multi-Tenant EKS Clusters. Distributed Learning and Broad Applications in Scientific Research, vol. 6, July 2020
  41. Babulal Shaik. Automating Compliance in Amazon EKS Clusters With Custom Policies . Journal of Artificial Intelligence Research and Applications, vol. 1, no. 1, Jan. 2021, pp. 587-610