Inflection AI is hiring a Remote Member of Technical Staff High Performance Computing
About the Role\n\nAt Inflection, the scale of our compute is critical to our mission of creating personal intelligence for everyone. We have several clusters in production currently, and are continually expanding our compute capacity. Youโll have the opportunity to work on the most powerful AI cluster in the world comprising 22K nVidia H-100 chips. \n\nInflection announces build of largest ML cluster in the world\n\nAs a Higher Performance Compute practitioner, you will be responsible for the smooth operation of these clusters on a day to day basis. You will be expected to add monitoring and telemetry to the clusters to preempt any issues that may arise, and on some occasions when issues do happen, you will be expected to put on your firefighting skills and resolve them. You will be monitoring jobs running on thousands of GPUs, or looking at workloads and their utilization. You will be partnering with other members of technical staff at Inflection to understand their needs and how to best achieve them. \n\nExperience as a HPC practitioner and with schedulers such as SLURM and Kubernetes will be key for your success in this role. Knowledge of GPUs and their architectures as well as common failures is also important. Familiarity with LLMs and the current state and trends in NLP will also help. Finally, comfortably stepping into any problem with the HPC infrastructure and resolving it is essential for success in this role. \n\nMinimum Requirements:\n\n\n* Direct experience managing a multi-100-node+ Slurm cluster\n\n* Direct experience with debugging massively parallel CPU/GPU jobs\n\n\n\n\nPreferred experience:\n\n\n* Managing ML-specific workloads on large GPU clusters on Slurm or Kubernetes\n\n\n\nEmployee Pay Disclosures\n\nAt Inflection AI, we aim to attract and retain the best employees and compensate them in a way that appropriately and fairly values their individual contributions to the company. The pay range for this position in California, is estimated to fall in the base range of approximately $150,000 - $300,000. This estimate can vary based on the factors described above, so the actual starting annual base salary may be above or below this range.
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nPalo Alto, California, United States
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.