Accenture Federal Services is hiring a
Remote Senior Site Reliability Engineer Senior Manager
At Accenture Federal Services, nothing matters more than helping the US federal government make the nation stronger and safer and life better for people.โฏOur 13,000+ people are united in a shared purpose to pursue the limitless potential of technology and ingenuity for clients across defense, national security, public safety, civilian, and military health organizations.\n\nJoin Accenture Federal Services, a technology company and part of global Accenture, to do work that matters in a collaborative and caring community, where you feel like you belong and are empowered to grow, learn and thrive through hands-on experience, certifications, industry training and more.\n\nJoin us to drive positive, lasting change that moves missions and the government forward!\n\nYou Are:\n\nWe are seeking aย Senior Site Reliability Engineer (SRE) with deep expertise in building and maintaining reliable, scalable systems and a passion for optimizing the performance, reliability, and efficiency of technical infrastructure. The ideal candidate will have a strong background in site reliability engineering principles, extensive experience with automation, and a proven ability to collaborate across teams to ensure seamless service delivery.\n\nThe Work:\n\nโข Design, build, and maintain reliable, scalable, and high-performance infrastructure and services to support business needs.\nโข Implement and advocate for SRE best practices, including automation, CI/CD pipelines, monitoring, and incident management.\nโข Collaborate with cross-functional teams to develop systems that meet high availability, performance, and reliability standards.\nโข Drive incident management processes, including root cause analysis, mitigation strategies, and long-term preventive measures.\nโข Establish, monitor, and refine service level objectives (SLOs), service level agreements (SLAs), and key performance indicators (KPIs) to ensure systems adhere to reliability and performance targets.\nโข Automate repetitive tasks to improve operational efficiency and reduce manual intervention.\nโข Build and maintain robust monitoring, logging, and alerting systems to ensure visibility into system performance and reliability.\nโข Provide technical mentorship and guidance to team members, fostering a culture of knowledge sharing and continuous improvement.\nโข Act as a technical leader by driving solutions to complex challenges, ensuring alignment with organizational goals.\nโข Prepare and deliver performance and reliability reports to stakeholders, offering insights and recommendations for improvements.\n\nHere's What You Need:\n\nโข Proven experience in site reliability engineering or a similar role, with a focus on application and infrastructure scalability, reliability, and performance.\nโข Strong knowledge of ITSM principles and incident management processes.\nโข Expertise in automation tools, scripting, and infrastructure-as-code (IaC) technologies.\nโข Proficiency with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).\nโข Experience with cloud platforms (e.g., AWS, Azure, GCP) and container technologies (e.g., Docker, Kubernetes).\nโข Strong analytical and problem-solving skills, with the ability to troubleshoot complex systems.\nโข Excellent communication and collaboration abilities, with a focus on cross-team partnerships.\nโข A passion for continuous learning, innovation, and driving imp \n\n#Location\nWashington, DC