Jobs / App***
AIML - Senior Machine Learning Infrastructure Engineer -ML Compute, ML Platform & Technology
App*** · Santa Clara, CA, United States
Visa sponsorship details are locked. Unlock company name and apply link with .
Santa Clara, CA, United StatesExp: 4+ yrs147,400-272,100 USD/yearlyRemote
Remuneration
147,400-272,100 USD/yearly
Location
Santa Clara, CA, United States
Visa sponsorship
Sponsors visa
Job summary
App*** is seeking a Senior Engineer on the ML Compute Team to design and deliver critical features for ML compute workloads, collaborate with teams across App*** on ML tasks, and understand industry trends to develop new technologies. This role involves building and maintaining compute infrastructure for ML workloads, focusing on stability, reliability, efficiency, and cost-effectiveness.
Benefits
Comprehensive medical and dental coverageRetirement benefitsDiscounted products and free servicesReimbursement for educational expenses
Qualifications
- Bachelor's degree in Computer Science, engineering, or a related field
- 4+ years of hands-on experience building scalable backend systems for training and evaluation of machine learning/deep learning models
- Proficient in programming languages like Python or Go
- Strong expertise in distributed systems, reliability, scalability, containerization, and cloud platforms
- Proficient in cloud computing and data processing infrastructure and tools such as Kubernetes, Ray, Beam, Flink
- Ability to clearly and concisely communicate technical and architectural problems, while working with partners to iteratively find solutions
- Advanced degree in Computer Science, engineering, or a related field (preferred)
- Proficient in working with and debugging accelerators like GPU, TPU, AWS Trainium (preferred)
- Proficient in ML training and deployment frameworks like JAX, Tensorflow, PyTorch, TensorRT, vLLM (preferred)
Responsibilities
- Collaborate with teams across App*** on ML workloads such as training, inferencing, and fine-tuning
- Drive the design and delivery of critical features to facilitate ML compute workloads
- Effectively communicate complex features and systems in detail
- Understand industry and company-wide trends to help assess and develop new technologies
- Scope, architect, and deliver innovative high-quality solutions
- Code using Go and Python
- Conduct code reviews
- Onboard new team members, provide mentorship, and enable successful ramp-up on team's code bases
Skills
GoAWSKubernetesPython
Degrees
Bachelor's degree in Computer Science, engineering, or a related fieldAdvanced degree in Computer Science, engineering, or a related field
Languages
GoPython
Relocation
Yes