NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Itโs a unique legacy of innovation thatโs fueled by great technologyโand amazing people. Today, weโre tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing whatโs never been done before takes vision, innovation, and the worldโs best talent. As an NVIDIAN, you will be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Join the team and see how you can make a lasting impact on the world! We have positions available for enthusiastic, hardworking and experienced software developers for working on our hardware integration and bare-metal provisioning related functionality in our Linux-based cluster management software environment. NVIDIA's Bright Cluster Manager is used to power thousands of Linux clusters around the world, varying from a few nodes to several thousands of nodes. Bright clusters can run on-premises, completely in the cloud, or in a hybrid environment. What youโll be doing: Development of the head node and compute node installation and provisioning processes. Work on functionality in the area of edge site deployment. Integrating our product with the latest hardware (e.g GPUs, DPUs, accelerators, high-speed interconnects such as Infiniband). Work on features related to composable infrastructure management. Develop new features for our BIOS and firmware upgrade management. Develop functionality that makes Bright clusters usable for a wider range of workloads, and increases scalability to allow clusters to scale to huge number of nodes. Adding support for new Linux distributions. Improving support for alternative CPU architectures such as ARM. Work on adding features to our Ansible collections for Cluster Installation and Management. Assist our support team with customer support requests in the above mentioned features and help our customers to use our product more efficiently. What we need to see: Degree in Computer Science or related field (or equivalent experience). 7+ years of experience in software development and/or related roles. Our software is based on Linux. You should be very familiar with the Linux operating system and in particular with networking concepts in Linux. In addition, good practical knowledge about the most common software that is installed as part of a typical Linux installation is required. You are proficient in Python and intimately familiar with object oriented software design, design patterns, and concurrent programming techniques. Emphasis on high quality of work and in producing clean code. Eager to learn and use new technologies. Ways to stand out from the crowd: Experience with Ansible. Experience with high-performance computing and system administration. Knowledge of Kubernetes, AWS, Azure, GCE, OpenStack, Jenkins and distributed programming. Proficiency in C++. The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA is the world leader in accelerated computing. NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and digital twins is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA. \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Design, Python, Node, Senior and Linux jobs that are similar:\n\n
$60,000 — $110,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nUS, CA, Santa Clara
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
NVIDIA is looking for outstanding software engineers to work on NVIDIAโs Data Center GPU Manager (DCGM) software. In this role you will work closely with the broader NVIDIA team to design and build Linux-based management agents, CLI tools and end-to-end integration solutions that combine GPUs with the rest of the data center software management ecosystem. We are focused on supporting NVIDIA products across HPC, cloud and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands rapidly. Your contributions will span many aspects of GPU system integration, including telemetry and metrics, health checks, diagnostics, configuration, accounting and policy. These tools fill roles of both passive background monitoring and active online management with a core emphasis on operational transparency and seamless integration in customer environments. Your code will support single node developer systems through large clusters with thousands of nodes. To be successful, you will need to have a strong Linux C/C++ background, familiarity with distributed software development, and a proven work ethic. You will be expected to jump in quickly and provide important contributions from day one. This is a dynamic work environment with many exciting opportunities awaiting. NVIDIA GPUs are central to many hot trends in the enterprise, cloud and datacenter. Come join us as we craft the future of accelerated computing and AI! What you'll be doing: Develop robust, scalable C++ user space data center management system software under Linux Build and maintain user-space libraries, agents, plugins, bindings and CLI tools Enable GPU management integration with the OSS ecosystem, including Kubernetes and Docker Support internal and external users through bug fixes, documentation and feature improvements Maintain high quality products through robust test coverage and smart design What we need to see: BS or higher in Computer Science or equivalent experience. 5+ years of meaningful industry experience with a strong C++ development background Familiarity with modern C++ standards (C++17/C++20). User space development and debugging expertise under Linux environments Experience with APIs and interface design. Experience with IPC and Multi-threading Outstanding written and verbal interpersonal skills Strong motivation and commitment to learn new skills Ability to implement all aspects of the software development lifecycle Ability to manage time in a fast, heavily multitasked environment Experience writing unit and system tests to ensure the correctness of fixes and new features Ways to stand out from the crowd: Development experience with Python, Go, and Rust. Experience with Jenkins and GitHub/GitLab CI/CD pipelines. Experience with containers, common orchestration frameworks and common logging/telemetry backends Experience with APIs and interface design. Exposure to GPU programming with CUDA. Experience with enterprise software development. Experience with cross-language interfaces (FFI, swig, etc.) in Go (CGO), Python, and Rust. Experience with metrics gathering/monitoring best practices. Experience with Open Telemetry, Prometheus, Grafana, DataDog, etc. Good understanding of extensive distributed systems and data-center operations/limitations. NVIDIA is widely considered to be one of the technology worldโs most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you! The base salary range is 148,000 USD - 276,000 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA is a Learning Machine NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest industries and profoundly impacting society. Learn more about NVIDIA. \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Design, Docker, Cloud, Node, Senior and Engineer jobs that are similar:\n\n
$80,000 — $117,500/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nUS, WA, Redmond
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
\nChainSafe is a leading blockchain research and development firm specializing in infrastructure solutions for the decentralized web. Alongside its contributions to significant ecosystems such as Ethereum, Polkadot, Filecoin, and more, ChainSafe creates solutions for developers across the web3 space utilizing our expertise in gaming, bridging and decentralized storage. As part of the mission to build innovative products for users and better tooling for developers, ChainSafe embodies an open-source and community-oriented ethos.\n\nAt ChainSafe, youโll be part of a global remote team that believes in the community's vital importance and contributes to advancing humanity with open-source and decentralized technology. To learn more about ChainSafe, look at our website or check out our work on GitHub.\n\nAbout the role\n\nAs a DevOps Engineer SRE for the Infrastructure Team, you will play a vital role in defining and implementing best-practice strategies and guides to ensure the reliability, scalability, and performance of our infrastructure that supports the daily production activities across multiple blockchain ecosystems. This includes multiple cloud & bare metal service providers, based on our containerized stack across linux environments.\n\nYour expertise will contribute to the sophistication of blockchain applications and redefine the boundaries of what's possible within this emerging technological sphere. All work across ChainSafe will be open-source, ensuring expansive opportunities for deep contribution and collaborative efforts across various web3 blockchains and ecosystems.\nResponsibilities\n\nWhat you will be doing\n\n\n* Oversee and enhance the health, performance, and security of environments, servers, and applications across the entire technology stack, including various blockchain services and full nodes.\n\n\n* Engage in managing various global environments, considering resources and latency to their observed regions\n\n\n\n\n\n* Be on-call, able to respond promptly outside of business hours\n\n* Implement automation efforts around builds, deployment, and automatic scaling\n\n* Work directly with the development and support teams to resolve issues\n\n* Design and implement procedures related to ChainSafeโs infrastructure operations\n\n\n* Execute deployments and network upgrade\n\n* Run and improve the incident response program\n\n\n\n\n\n* Provide training and guidance for other members of the infrastructure team, ensuring round-the-clock node operation and incident response.\n\n* Document and communicate technical details via open-source documentation\n\n* Collaborate with various internal teams and the wider community to build, expand, and scale ChainSafeโs infrastructure architecture, by tapping into new trends and opportunities highlighted by internal data, blockchain research, and the wider blockchain ecosystem\n\n\n\nRequirements\n\n\n* Practical knowledge of at least one programming language (Go, TypeScript, Solidity, or Rust is a big plus)\n\n* Demonstrable experience with modern Infrastructure as Code (IaC) tools (Terraform, Helm, Ansible, etc), automating deployment, and best CI/CD practices and tools.\n\n* 3+ years of experience managing resources in either AWS, GCP, or Azure.\n\n* 3+ years of experience working with Linux.\n\n* 3+ years of experience with monitoring and alerting tools (DataDog, Grafana, Prometheus, etc.)\n\n* 3+ years of experience implementing distributed tracing, monitoring, and logging systems using OpenTelemetry Protocol\n\n* 3+ years of experience building and participating in incident response systems (PagerDuty, etc) and handling the emergency response to production environment failures.\n\n* Excellent communication skills with the ability to document and convey technical details clearly\n\n* Ability to work autonomously as well as with the wider team\n\n\n\nAs a plus:\n\n\n* Experience working in Web3 domain\n\n* Experience working with bare metal deployments\n\n* Experience automating network deployment\n\n* Understanding at least two of the following domains - Web Security, Web3 Security, Cloud Security, Systems Security, and Applied Cryptography.\n\n\n\nHiring Steps\n\n\n* Selected candidates will be invited to a 30โtoโ45โminute values interview with one or two of our team members\n\n* Technical 60-minute interview with one or two of our engineers.\n\n* Then, candidates will be asked to complete a homework assignment in under 3-4 hours.\n\n* Lastly, a 60-minute call with the hiring team to discuss the results and final interview.\n\n\n\n\nWhy Join ChainSafe\n\nFounded by developers for developers, ChainSafe is a remote-first company with an international team. We continue to provide opportunities for personal and professional growth, value autonomy and responsibility, have a results-driven environment, and offer flexible work hours.\n\nWe care deeply about our values and look for these attributes in every new team member. In addition, we recognize the benefits of cultivating a diverse team and aspire to embed respect for all people into our culture. We encourage women, the LGBTQIA+ community, people of colour, and members of any other group underrepresented in the blockchain space (or tech in general) to apply.\n\nHow to Apply\n\nPlease fill out the Greenhouse application form below and ensure that you attach your resume and link your Github/Gitlab profile or any software project you have contributed to (if applicable). \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Web3, DevOps, Cloud, Node and Engineer jobs that are similar:\n\n
$70,000 — $110,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nVancouver, British Columbia, Canada
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
\nChainSafe is a leading blockchain research and development firm specializing in infrastructure solutions for the decentralized web. Alongside its contributions to significant ecosystems such as Ethereum, Polkadot, Filecoin, and more, ChainSafe creates solutions for developers across the web3 space utilizing our expertise in gaming, bridging and decentralized storage. As part of the mission to build innovative products for users and better tooling for developers, ChainSafe embodies an open-source and community-oriented ethos.\n\nAt ChainSafe, youโll be part of a global remote team that believes in the community's vital importance and contributes to advancing humanity with open-source and decentralized technology. To learn more about ChainSafe, look at our website or check out our work on GitHub.\n\nAbout the role\n\nAs a DevOps Engineer SRE for the Infrastructure Team, you will play a vital role in defining and implementing best-practice strategies and guides to ensure the reliability, scalability, and performance of our infrastructure that supports the daily production activities across multiple blockchain ecosystems. This includes multiple cloud & bare metal service providers, based on our containerized stack across linux environments.\n\nYour expertise will contribute to the sophistication of blockchain applications and redefine the boundaries of what's possible within this emerging technological sphere. All work across ChainSafe will be open-source, ensuring expansive opportunities for deep contribution and collaborative efforts across various web3 blockchains and ecosystems.\nResponsibilities\n\nWhat you will be doing\n\n\n* Oversee and enhance the health, performance, and security of environments, servers, and applications across the entire technology stack, including various blockchain services and full nodes.\n\n\n* Engage in managing various global environments, considering resources and latency to their observed regions\n\n\n\n\n\n* Be on-call, able to respond promptly outside of business hours\n\n* Implement automation efforts around builds, deployment, and automatic scaling\n\n* Work directly with the development and support teams to resolve issues\n\n* Design and implement procedures related to ChainSafeโs infrastructure operations\n\n\n* Execute deployments and network upgrade\n\n* Run and improve the incident response program\n\n\n\n\n\n* Provide training and guidance for other members of the infrastructure team, ensuring round-the-clock node operation and incident response.\n\n* Document and communicate technical details via open-source documentation\n\n* Collaborate with various internal teams and the wider community to build, expand, and scale ChainSafeโs infrastructure architecture, by tapping into new trends and opportunities highlighted by internal data, blockchain research, and the wider blockchain ecosystem\n\n\n\nRequirements\n\n\n* Practical knowledge of at least one programming language (Go, TypeScript, Solidity, or Rust is a big plus)\n\n* Demonstrable experience with modern Infrastructure as Code (IaC) tools (Terraform, Helm, Ansible, etc), automating deployment, and best CI/CD practices and tools.\n\n* 3+ years of experience managing resources in either AWS, GCP, or Azure.\n\n* 3+ years of experience working with Linux.\n\n* 3+ years of experience with monitoring and alerting tools (DataDog, Grafana, Prometheus, etc.)\n\n* 3+ years of experience implementing distributed tracing, monitoring, and logging systems using OpenTelemetry Protocol\n\n* 3+ years of experience building and participating in incident response systems (PagerDuty, etc) and handling the emergency response to production environment failures.\n\n* Excellent communication skills with the ability to document and convey technical details clearly\n\n* Ability to work autonomously as well as with the wider team\n\n\n\nAs a plus:\n\n\n* Experience working in Web3 domain\n\n* Experience working with bare metal deployments\n\n* Experience automating network deployment\n\n* Understanding at least two of the following domains - Web Security, Web3 Security, Cloud Security, Systems Security, and Applied Cryptography.\n\n\n\nHiring Steps\n\n\n* Selected candidates will be invited to a 30โtoโ45โminute values interview with one or two of our team members\n\n* Technical 60-minute interview with one or two of our engineers.\n\n* Then, candidates will be asked to complete a homework assignment in under 3-4 hours.\n\n* Lastly, a 60-minute call with the hiring team to discuss the results and final interview.\n\n\n\n\nWhy Join ChainSafe\n\nFounded by developers for developers, ChainSafe is a remote-first company with an international team. We continue to provide opportunities for personal and professional growth, value autonomy and responsibility, have a results-driven environment, and offer flexible work hours.\n\nWe care deeply about our values and look for these attributes in every new team member. In addition, we recognize the benefits of cultivating a diverse team and aspire to embed respect for all people into our culture. We encourage women, the LGBTQIA+ community, people of colour, and members of any other group underrepresented in the blockchain space (or tech in general) to apply.\n\nHow to Apply\n\nPlease fill out the Greenhouse application form below and ensure that you attach your resume and link your Github/Gitlab profile or any software project you have contributed to (if applicable). \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Web3, DevOps, Cloud, Node and Engineer jobs that are similar:\n\n
$70,000 — $110,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nVancouver, British Columbia, Canada
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
\n\n\nYour Role & Mission\n\nThe Senior Application Security Engineer will work with product and engineering to create a secure SDLC, design security features and implement tools, education and processes to reduce risk of security issues in the tech stack.\n\nResponsibilities\n\n\n* Select or build tooling to help developers build secure code\n\n* Provide overall security architectural advice to Engineering and IT\n\n* Manage issues sourced from penetration tests and bug bounty programs \n\n* Participate in the security champions program\n\n* Help Product, Engineering and IT incorporate security requirements into new products from inception\n\n* Assist in the creation and maintenance of Security Risk Models for new projects and existing systems\n\n\n\n\nSkills & Competencies\n\n\n* 5+ Years of Web Application Security experience\n\n* Strong experience with vulnerability management, or penetration testing is required.\n\n* Extensive experience in conducting Architectural Reviews and Threat Models frequently is required. \n\n* Strong knowledge of common AppSec issues and tooling (e.g. SCA, SAST, DAST)\n\n* Strong Linux knowledge is a plus. \n\n* Experience with cloud services, ideally GCP is plus. \n\n* Strong software development skills ideally in Ruby, Node Secondary\n\n* Strong Communication and Influencing skills\n\n* Should have worked in SaaS environment. \n\n* Should have extensive knowledge of Open Redirect, OAuth, and CSRF. \n\n* Certifications: OSCP/OSWE/CEH: At least 1 Certification is a plus. \n\n\n\n\n#LI-JM1\n\n \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Design, SaaS, Testing, Education, Cloud, Node, Senior, Engineer and Linux jobs that are similar:\n\n
$60,000 — $100,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nBuenos Aires, Buenos Aires, Argentina
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
\nWe are looking for people willing to work a 10am - 7pm PST schedule or later. This role can be fully remote. \n\nAbout the role:\n\nThe High Performance Computing Operations team is responsible for the day-to-day provisioning, management and uptime of CoreWeaveโs ever-expanding fleet of server nodes. Playing a central role in CoreWeaveโs growth strategy, this team is on the front line for configuration, updates and remote troubleshooting of our highest tier of supercomputing clusters and their networking, delivery platforms and tools dependencies. You will be in a daily battle with the forces of entropy to maximize the number of nodes CoreWeave can deliver to customers.\n\nWe are seeking curious, creative and persistent problem solvers to join our HPC Operations team to help us drive batches of server nodes through our provisioning and validation processes while efficiently and effectively troubleshooting node or cluster problems as they arise. This individual will join a team of committed engineers working to deploy nodes as fast as they can be racked and turned on. \n\nKey Responsibilities:\n\n\n* Install, configure, and maintain large-scale high-performance supercomputing clusters running state-of-the-art GPUs\n\n* Troubleshoot hardware and software issues; escalate and coordinate as needed with data center, network and platform teams to drive resolution\n\n* Monitor and analyze system performance and take appropriate remediation actions for cloud health\n\n* Approach your work with flexibility and optimism anticipating shifting business and technical priorities\n\n* Create and maintain documentation of team processes, knowledge and best practices for system management\n\n* Think critically about your day-to-day work and work collaboratively to improve team processes and efficiency\n\n\n\n\nSuccessful candidates typically share the following skills and experience:\n\n\n* 2 or more years of experience troubleshooting or administering data center or on-prem infrastructure (servers, storage, network or a mix)\n\n* Strong understanding of Linux system administration and networking concepts\n\n* Ability to troubleshoot hardware and software issues and perform system maintenance tasks consistently and reliably\n\n\n\n\nIdeal candidates may also have experience in one or more of these:\n\n\n* Software development or scripting languages (bash, python, powershell, etc)\n\n* Grafana, prometheus, promsql queries or similar observability platforms\n\n* Data center environments including server racks, HVAC systems, fiber trays\n\n* Kubernetes administration\n\n\n\n\nOur compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $80,000/year in our lowest geographic market up to $110,000/year in our highest geographic market. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.\n\n \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Cloud, Node, Engineer and Linux jobs that are similar:\n\n
$52,500 — $95,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nLas Vegas, Nevada, United States
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
Chan Zuckerberg Biohub - San Francisco is hiring a
Remote AI ML HPC Principal Engineer
\nThe Opportunity\n\nThe Chan Zuckerberg Biohub Network has an immediate opening for an AI/ML High Performance Computing (HPC) Principal Engineer. The CZ Biohub Network is composed of several new institutes that the Chan Zuckerberg Initiative created to do great science that cannot be done in conventional environments. The CZ Biohub Network brings together researchers from across disciplines to pursue audacious, important scientific challenges. The Network consists of four institutes throughout the country; San Francisco, Silicon Valley, Chicago and New York City. Each institute closely collaborates with the major universities in its local area. Along with the world-class engineering team at the Chan Zuckerberg Initiative, the CZ Biohub supports several 100 of the brightest, boldest engineers, data scientists, and biomedical researchers in the country, with the mission of understanding the mysteries of the cell and how cells interact within systems.\n\nThe Biohub is expanding its global scientific leadership, particularly in the area of AI/ML, with the acquisition of the largest GPU cluster dedicated to AI for biology. The AI/ML HPC Principal Engineer will be tasked with helping to realize the full potential of this capability in addition to providing advanced computing capabilities and consulting support to science and technical programs. This position will work closely with many different science teams simultaneously to translate experimental descriptions into software and hardware requirements and across all phases of the scientific lifecycle, including data ingest, analysis, management and storage, computation, authentication, tool development and many other computing needs expressed by scientific projects.\n\nThis position reports to the Director for Scientific Computing and will be hired at a level commensurate with the skills, knowledge, and abilities of the successful candidate.\n\nWhat You'll Do\n\n\n* Work with a wide community of scientific disciplinary experts to identify emerging and essential information technology needs and translate those needs into information technology requirements\n\n* Build an on-prem HPC infrastructure supplemented with cloud computing to support the expanding IT needs of the Biohub\n\n* Support the efficiency and effectiveness of capabilities for data ingest, data analysis, data management, data storage, computation, identity management, and many other IT needs expressed by scientific projects\n\n* Plan, organize, track and execute projects\n\n* Foster cross-domain community and knowledge-sharing between science teams with similar IT challenges\n\n* Research, evaluate and implement new technologies on a wide range of scientific compute, storage, networking, and data analytics capabilities\n\n* Promote and assist researchers with the use of Cloud Compute Services (AWS, GCP primarily) containerization tools, etc. to scientific clients and research groups\n\n* Work on problems of diverse scope where analysis of data requires evaluation of identifiable factors\n\n* Assist in cost & schedule estimation for the IT needs of scientists, as part of supporting architecture development and scientific program execution\n\n* Support Machine Learning capability growth at the CZ Biohub\n\n* Provide scientist support in deployment and maintenance of developed tools\n\n* Plan and execute all above responsibilities independently with minimal intervention\n\n\n\n\nWhat You'll Bring \n\nEssential โ\n\n\n* Bachelorโs Degree in Biology or Life Sciences is preferred. Degrees in Computer Science, Mathematics, Systems Engineering or a related field or equivalent training/experience also acceptable.\n\n* A minimum of 8 years of experience designing and building web-based working projects using modern languages, tools, and frameworks\n\n* Experience building on-prem HPC infrastructure and capacity planning\n\n* Experience and expertise working on complex issues where analysis of situations or data requires an in-depth evaluation of variable factors\n\n* Experience supporting scientific facilities, and prior knowledge of scientific user needs, program management, data management planning or lab-bench IT needs\n\n* Experience with HPC and cloud computing environments\n\n* Ability to interact with a variety of technical and scientific personnel with varied academic backgrounds\n\n* Strong written and verbal communication skills to present and disseminate scientific software developments at group meetings\n\n* Demonstrated ability to reason clearly about load, latency, bandwidth, performance, reliability, and cost and make sound engineering decisions balancing them\n\n* Demonstrated ability to quickly and creatively implement novel solutions and ideas\n\n\n\n\nTechnical experience includes - \n\n\n* Proven ability to analyze, troubleshoot, and resolve complex problems that arise in the HPC production compute, interconnect, storage hardware, software systems, storage subsystems\n\n* Configuring and administering parallel, network attached storage (Lustre, GPFS on ESS, NFS, Ceph) and storage subsystems (e.g. IBM, NetApp, DataDirect Network, LSI, VAST, etc.)\n\n* Installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.) and implementing fairshare, node sharing, backfill etc.. for compute and GPUs\n\n* Red Hat Enterprise Linux, CentOS, or derivatives and Linux services and technologies like dnsmasq, systemd, LDAP, PAM, sssd, OpenSSH, cgroups\n\n* Scripting languages (including Bash, Python, or Perl)\n\n* OpenACC, nvhpc, understanding of cuda driver compatibility issues\n\n* Virtualization (ESXi or KVM/libvirt), containerization (Docker or Singularity), configuration management and automation (tools like xCAT, Puppet, kickstart) and orchestration (Kubernetes, docker-compose, CloudFormation, Terraform.)\n\n* High performance networking technologies (Ethernet and Infiniband) and hardware (Mellanox and Juniper)\n\n* Configuring, installing, tuning and maintaining scientific application software (Modules, SPACK)\n\n* Familiarity with source control tools (Git or SVN)\n\n* Experience with supporting use of popular ML frameworks such as Pytorch, Tensorflow\n\n* Familiarity with cybersecurity tools, methodologies, and best practices for protecting systems used for science\n\n* Experience with movement, storage, backup and archive of large scale data\n\n\n\n\nNice to have - \n\n\n* An advanced degree is strongly desired\n\n\n\n\nThe Chan Zuckerberg Biohub requires all employees, contractors, and interns, regardless of work location or type of role, to provide proof of full COVID-19 vaccination, including a booster vaccine dose, if eligible, by their start date. Those who are unable to get vaccinated or obtain a booster dose because of a disability, or who choose not to be vaccinated due to a sincerely held religious belief, practice, or observance must have an approved exception prior to their start date.\n\nCompensation \n\n\n* $212,000 - $291,500\n\n\n\n\nNew hires are typically hired into the lower portion of the range, enabling employee growth in the range over time. To determine starting pay, we consider multiple job-related factors including a candidateโs skills, education and experience, market demand, business needs, and internal parity. We may also adjust this range in the future based on market data. Your recruiter can share more about the specific pay range during the hiring process. \n\n#Salary and compensation\n
No salary data published by company so we estimated salary based on similar jobs related to Consulting, Education, Cloud, Node, Engineer and Linux jobs that are similar:\n\n
$57,500 — $85,000/year\n
\n\n#Benefits\n
๐ฐ 401(k)\n\n๐ Distributed team\n\nโฐ Async\n\n๐ค Vision insurance\n\n๐ฆท Dental insurance\n\n๐ Medical insurance\n\n๐ Unlimited vacation\n\n๐ Paid time off\n\n๐ 4 day workweek\n\n๐ฐ 401k matching\n\n๐ Company retreats\n\n๐ฌ Coworking budget\n\n๐ Learning budget\n\n๐ช Free gym membership\n\n๐ง Mental wellness budget\n\n๐ฅ Home office budget\n\n๐ฅง Pay in crypto\n\n๐ฅธ Pseudonymous\n\n๐ฐ Profit sharing\n\n๐ฐ Equity compensation\n\nโฌ๏ธ No whiteboard interview\n\n๐ No monitoring system\n\n๐ซ No politics at work\n\n๐ We hire old (and young)\n\n
\n\n#Location\nSan Francisco, California, United States
๐ Please reference you found the job on Remote OK, this helps us get more companies to post here, thanks!
When applying for jobs, you should NEVER have to pay to apply. You should also NEVER have to pay to buy equipment which they then pay you back for later. Also never pay for trainings you have to do. Those are scams! NEVER PAY FOR ANYTHING! Posts that link to pages with "how to work online" are also scams. Don't use them or pay for them. Also always verify you're actually talking to the company in the job post and not an imposter. A good idea is to check the domain name for the site/email and see if it's the actual company's main domain name. Scams in remote work are rampant, be careful! Read more to avoid scams. When clicking on the button to apply above, you will leave Remote OK and go to the job application page for that company outside this site. Remote OK accepts no liability or responsibility as a consequence of any reliance upon information on there (external sites) or here.
This job post is closed and the position is probably filled. Please do not apply. Work for Gruntwork and want to re-open this job? Use the edit link in the email when you posted the job!
About Gruntwork
Gruntwork aims to improve humanity's most important invention: Software. Our focus today is on creating a DevOps UX that software engineers actually enjoy, which we do by creating building blocks that make launching in the cloud 10x better/faster/easier. We work with AWS, K8s, Terraform, Terragrunt, Terratest, Go, Typescript, and React, and introduce new tech as needed. Weโre a small team (~20 people), but our clients include Toyota, Adobe, TicketMaster, Verizon, and hundreds of startups.
We are profitable, self-funded (no investors, no debt), and pay salaries, equity, and bonuses according to transparent formulas. We are 100% remote, with 2/3 of our team in the USA and 1/3 in Europe. We plan company-wide in-person meetups every few months and are known world-wide for both DevOps thought leadership and our popular open source tools, Terragrunt and Terratest.
Our measure of a successful Grunt is (1) think like an owner, (2) make impact, (3) communicate effectively, (4) be a good person. If this sounds like you, we're hiring!
About The Role
Our infrastructure as code library and platform team delivers an end-to-end, best-practices infrastructure on AWS in just days, all as a product. In this role, youโll collaborate with other senior-level engineers to define the next generation of AWS and DevOps best practices, codify them for use by thousands of engineers, and design a next-level experience for implementing, operating and understanding them.
What You'll Work On
Build a better DevOps experience.ย We have a unique product that generates complete, multi-account AWS architectures for Terraform and Terragrunt in just a few hours using a collection of internal Golang tools. Help us take this to the next level by using that product to deliver prod architectures directly to customers, and then leveraging their feedback for improvements. Better yet, help us get to the point where the entire experience is completely automated or self-service.
Codify AWS and Terraform best practices.ย Customers look to Gruntwork to share the best way to launch on AWS. Discover AWS and Terraform best practices, and then codify them as repeatable patterns that Gruntwork customers can pull off the shelf.
Integrate with the Gruntwork platform.ย Integrate your product work with our company-wide platform, which consists of a REST API (Next.js/Typescript), a web-based single-page app (Next.js/React/Typescript, Tailwind), and a first-class CLI tool (Go).
Build out the Infrastructure as Code Library.ย Create and maintain reusable infrastructure modules for a variety of infrastructure (e.g., EKS, ECS, RDS, VPC, Lambda, EC2, S3, ElastiCache, etc.), using a variety of tools (e.g., Terraform, Go, Python, Bash, Docker, Packer, etc.) on AWS.
Contribute to open source.ย Contribute to our open source projects as needed, including Terragrunt, Terratest, cloud-nuke, bash-commons, and more.
Train and mentor.ย Play to your strengths and areas of expertise by not only writing code and working on the product but also by sharing knowledge and mentoring both other team members and our customers in those areas.
Support customers.ย Gruntwork is a small, distributed, self-funded, profitable startup, so we'll ask you to provide a limited amount of support to enable learning directly from customers about how we can improve and continue achieving our vision of making it easier to understand, build, and deploy software.
Your Ideal Background
You should meet some of these requirements, but you don't need to meet all of them. As a company, we look for people who can leverage their existing skills to make significant impact in the near term. As an individual, you are likely looking for a growth opportunity, a core part of which is building new skills.
You know how to write code across the stack and have experience in one or more of the following: Go, C++, Python, Typescript, Bash, React, Next.js
You have production-level experience with AWS.
You have expertise in one or more of the following: Kubernetes (any managed offering, preferably EKS), ECS<, EC2, Lambda / Serverless, API Gateway, RDS, S3, AWS Config, AWS CloudTrail, Amazon GuardDuty, IAM, VPC, VPN.
You have worked with Terraform or other infrastructure-as-code tools like CloudFormation, CDK, or Pulumi in prod.
You have experience achieving compliance and going through audits (e.g., SOC2, HIPAA, vendor, etc).
You have a strong background in software engineering.
You have strong communication skills in English and are comfortable engaging with external customers.
Your Ideal Values
You have a passion for imparting best practices to other developers.
But you would rather invest the time to automate a problem than do the same work again.
You have a passion for learning (new technologies and languages specifically).
But you are motivated most by making impact.
You are inspired by our values (https://gruntwork.io/about/#our-values).
Compatible Time Zones
You'll be working with a team in theย US time zones, so you can be located in almost any country as long as your time zone is no further west than Los Angeles (GMT-8/GMT-7) and no further east than New York (GMT-5/GMT-4). We've found that when everyone on the team is located in similar time zones, it's easier to collaborate and there's much less pressure to stay up late or get up early, so this is a hard constraint, even if you're willing to work hours different from your current time zone.
Benefits
Our benefits reflect our values. We believe compensation should be fair, transparent, and generous. We hire Grunts in many countries, so some details may vary.
Location Independent, Above-Market Salary.ย To reduce bias and increase transparency, we compute all salaries using formulas. The formula factors in your title and uses a multiplier to produce a result that's above market for that title. Our salaries are location independent.
Profit-Sharing Bonus.ย We set aside a pot of money at the end of each year based on profits and distribute bonuses according to a formula that uses as inputs your level within the company and the length of your tenure at the company.
Hardware Budget.ย We'll buy you a brand new 16" Apple MacBook Pro (or other computer of your choosing of equivalent value) upon joining. It will be owned by you, not the company.
Personal Budget.ย We'll give you a personal budget of $1,000 USD per month to spend on your workspace (e.g., a co-working space), health (e.g., gym, yoga), time (e.g., babysitter), and/or learning (e.g., books, courses).
Medical/Dental/Vision Insurance.ย We offer a range of high-quality plans with a large portion paid by the company. For countries other than the US, this includes extra coverage on top of your statutory insurance.
In addition to the global benefits listed above, we have some US-specific benefits as well:
FSA and HSAs.ย We don't contribute to these accounts, but we do offer them as an option.
401(k).ย We contribute a portion of your salary to your 401(k).
Disability insurance.ย If you get disabled, we have a policy that will pay out a portion of your salary.
Life First, Then Work
We believe in planning our work around our lives, not the other way around. To help achieve that we offer:
Remote workย that lets you control your hours and physical location.
Normal working hoursย that usually amount to not more than ~40h per week, and no working on weekends or holidays.
Deliberate project planningย that takes into account the time zone of all team members.
A minimum vacation policyย where you must takeย at leastย 4 weeks per year away from work.
No one carrying a pagerย and no on-call rotation. We enable this by only offering support contracts with SLAs of responses on business days / hours only.
Please mention the word WARMTH when applying to show you read the job post completely (#RMjE2LjczLjIxNi4xMjU=). This is a feature to avoid fake spam applicants. Companies can search these words to find applicants that read this and instantly see they're human.
Salary and compensation
$180,000 — $240,000/year
Benefits
๐ฐ 401(k)
๐ Distributed team
โฐ Async
๐ค Vision insurance
๐ฆท Dental insurance
๐ Medical insurance
๐ Unlimited vacation
๐ Paid time off
๐ฐ 401k matching
๐ Company retreats
๐ฌ Coworking budget
๐ Learning budget
๐ช Free gym membership
๐ง Mental wellness budget
๐ฅ Home office budget
๐ฐ Profit sharing
๐ฐ Equity compensation
โฌ๏ธ No whiteboard interview
๐ No monitoring system
๐ซ No politics at work
๐ We hire old (and young)
How do you apply?
This job post has been closed by the poster, which means they probably have enough applicants now. Please do not apply.