Ankit K.
0About
• Role Overview: Work with the team to make the platform stable and highly available, propose new ideas, and fix recurrent issues. • Enhanced Platform Stability: Successfully identified and resolved root causes of alerts, proposed and implemented fixes for recurrent issues, including new metrics and cleanup scripts, and improved overall system reliability. • Mongoproxy is one of the key projects that I worked on: • Managed infra, which spanned over 2 Kubernetes clusters with 30 and 12 nodes, respectively, 22 bare metal servers, coordinated with the DBA team for the backend MongoDB sharded cluster spanning over +600 servers. • Led the migration of Mongoproxy from Bare Metal to Kubernetes in production, managing clusters serving +200k clients per day. Led a POC for a private cluster to improve security and achieve high availability and scalability for Mongoproxy. • Developed Comprehensive Observability Solutions - dashboards and alert systems, ensuring effective monitoring. • Incident Management and Support: Acted as a first responder for Mongoproxy issues, provided critical incident support, and facilitated effective resolution of developer escalations. • Automated Key Operational Tasks like Jenkins pipelines for bare metal and Kubernetes Mongoproxy restarts in the required sequence, maintaining high availability, reducing manual labor and increasing efficiency. • Led Dynatrace Monitoring Migration: Managed the transition to Dynatrace for platform monitoring. Automated setup with Ansible playbooks and Terraform scripts, enhancing system observability, ensured smooth onboarding of the DBA team. • Fostered Knowledge Sharing and Documentation: conducted KT sessions, created detailed documentation and runbooks, and advocated for continuous learning to reduce single points of failure. • Operational Excellence in On-Call Duties: Handled night on-call responsibilities, consistently resolving developer escalations and on-call alerts with a high success rate.