pulla s.
0About
Software Engineer | Stanvac Systems PVT LTD. Hyderabad, India| Feb’21 – Present Project Title: Multi-Region Cloud Infrastructure Automation and Reliability Enhancement Project Description: Secure and Highly Available Cloud Infrastructure Automation Duration: Feb 2021 - Present Project Overview: The project focused on designing, building, and managing a secure, highly available, and robust infrastructure across multiple AWS regions. The goals included leveraging Infrastructure as Code (IaC) with Terraform, implementing container orchestration with Kubernetes (EKS), and enhancing system monitoring and automation. The project also incorporated setting up a comprehensive CI/CD pipeline using Gitlab CI, optimizing system performance with Datadog, and ensuring high standards of security and reliability. Responsibilities and Contributions: 1. Cloud Infrastructure Management: • Multi-Region Deployment: Designed and deployed a highly available, cross-region infrastructure on AWS, utilizing services such as EC2, EKS, ECS, RDS, IAM, VPC, S3 and Airflow. • Security Implementation: Configured secure network setups, including VPCs, subnets, security groups, and IAM roles/policies, ensuring compliance and protection of resources. • Scalability: Implemented autoscaling policies for EC2 and EKS clusters to handle varying workloads efficiently. 2. Infrastructure as Code (IaC): • Terraform Expertise: Used Terraform to define, provision, and manage cloud resources, creating reusable and modular Terraform scripts for consistent environment setups. • Automation: Automated infrastructure provisioning, reducing manual intervention and increasing deployment speed. 3. Container Orchestration: • Kubernetes Management: Deployed and managed containerized applications using Kubernetes (EKS), ensuring high availability and fault tolerance. • Helm Charts: Utilized Helm charts for managing Kubernetes applications, enabling easy updates and rollbacks. 4. CI/CD Pipeline Implementation: • Gitlab CI: Built and maintained robust CI/CD pipelines using Gitlab CI, automating the build, test, and deployment processes. • BuildKit Integration: Integrated BuildKit for efficient Docker image builds, reducing build times and improving deployment efficiency. • Automated Testing: Implemented automated testing frameworks within the CI/CD pipeline to ensure code quality and reliability. 5. Site Reliability Engineering (SRE): • SLA/SLO Definition: Established and monitored Service Level Agreements (SLA) and Service Level Objectives (SLO) to maintain system reliability and performance. • Incident Response: Developed and maintained runbooks and incident response plans, participating in on-call rotations to provide 24/7 support. 6. Scripting and Automation: • Proficient Scripting: Developed automation scripts using Python to streamline operational tasks and reduce manual workload. • Task Automation: Automated routine maintenance tasks, backups, and disaster recovery processes. 7. Monitoring and Logging: • Datadog and CloudWatch: Implemented comprehensive monitoring and alerting systems using Datadog and CloudWatch, enabling proactive identification and resolution of issues. • Log Management: Set up centralized logging and monitoring solutions to aggregate logs from various services for better observability. 8. Security and Cost Management: • Security Best Practices: Conducted regular security audits, vulnerability assessments, and compliance checks to ensure a secure infrastructure. • Cost Optimization: Monitored and optimized infrastructure costs, implementing cost-saving measures without compromising performance. 9. Collaboration and Communication: • Team Collaboration: Worked closely with development teams to ensure smooth integration and deployment of applications, fostering a collaborative DevOps culture. • Documentation: Documented processes, best practices, and knowledge sharing to improve team efficiency and onboarding.