The Role

We are seeking an experienced and proactive Site Reliability Engineer to join our technology team. This is a hybri...">

Back to Jobs

Site Reliability Engineer

Remote, USA Full-time Posted 2025-11-24

The Role

We are seeking an experienced and proactive Site Reliability Engineer to join our technology team. This is a hybrid role that combines the responsibilities of building and maintaining a scalable, resilient cloud infrastructure with the critical function of leading our response to operational and security incidents.

You will be responsible for the entire lifecycle of our production environment, from managing CI/CD pipelines and infrastructure-as-code development to real-time threat monitoring and crisis management. The ideal candidate is a hands-on engineer who thrives in a fast-paced environment, possesses a deep understanding of cloud-native technologies, and has a proven track record in incident response and management.

Key Responsibilities

DevOps & Infrastructure Management:

  • Manage, automate, and maintain our production infrastructure hosted on Amazon Web Services (AWS), including our multi-AZ Amazon EKS cluster, RDS databases, and ElastiCache instances.
  • Develop, manage, and improve our CI/CD pipelines using GitHub Actions to ensure smooth and reliable deployments.
  • Own and advance our Infrastructure as Code (IaC) practices using Terraform to ensure our infrastructure is reproducible, scalable, and secure.
  • Collaborate with development teams to support the deployment and operation of backend microservices (.NET, Go) and frontend applications (React, hosted on Vercel).
  • Monitor and manage system capacity and performance, ensuring high availability and low latency for our users.
  • Implement and enforce security best practices across the infrastructure, including network segmentation, secret management, and access controls.

Incident Response & Security:

  • Serve as the primary lead for responding to, managing, and resolving production incidents, from initial detection to post-mortem analysis.
  • Develop, maintain, and test incident response playbooks, disaster recovery plans, and business continuity procedures.
  • Utilize our monitoring stack (AWS GuardDuty, CloudTrail, Inspector, Security Hub) to proactively detect, triage, and respond to security threats and system anomalies.
  • Conduct thorough root cause analysis (RCA) for all major incidents and drive the implementation of corrective and preventative actions.
  • Support and participate in regular security and resilience testing, including vulnerability scanning and software supply chain security checks using tools like Trivy.
  • Ensure all operational and incident management activities are documented and executed in alignment with our DORA and MiCA compliance obligations.

Security Operations & Compliance:

  • Define, implement, and maintain a PSIRT process (Product Security Incident Response Team), including both infrastructure-related and blockchain/on-chain incidents.
  • Design and execute incident response processes, including tooling, documentation, and post-incident reviews.
  • Lead digital forensics efforts: define tools, processes, and playbooks.
  • Roll out and manage EDR (Endpoint Detection and Response) tools for both infrastructure and employee endpoints.
  • Implement and manage MDM (Mobile Device Management) for laptops and phones to ensure secure key storage and prevent compromise.
  • Define and enforce security rules and guardrails aligned with business risk.
  • Harden Kubernetes clusters (EKS), containers, and implement admission control policies.
  • Maintain and test Disaster Recovery (DR) and backup plans regularly.
  • Manage Cloudflare WAF rules, vulnerability management (SAST/SCA/DAST), and AWS/Kubernetes event-based security tooling.

REQUIREMENTS

Required Skills & Experience

  • 10+ years of experience in the field.
  • Proven experience in a Site Reliability Engineering (SRE), DevOps or similar role.
  • Deep, hands-on expertise with Amazon Web Services (AWS), particularly EKS, RDS, VPC, IAM, and security services like GuardDuty and Security Hub.
  • Strong proficiency with containerization (Docker) and Kubernetes orchestration in a production environment.
  • Expert-level knowledge of Infrastructure as Code, with extensive experience using Terraform.
  • Demonstrable experience building and managing CI/CD pipelines, preferably with GitHub Actions.
  • Solid experience in leading incident response efforts, including incident command, diagnostics, and post-incident review.
  • A strong understanding of networking principles, including VPCs, subnets, load balancing (NLB), and edge security (WAF, DDoS protection) with platforms like Cloudflare.
  • Familiarity with modern monitoring, logging, and observability principles and tools.

Desired Skills & Experience

  • Experience working in a highly regulated environment, such as FinTech, banking, or crypto services.
  • Familiarity with our wider tech stack, including Vercel, Fireblocks, and NGINX.
  • Experience with security scanning tools for containers and dependencies (e.g., Trivy).
  • Knowledge of authentication mechanisms like JWE and best practices for secrets management (e.g., credential stores, AWS KMS).
  • Scripting skills in languages such as Python or Bash for automation tasks.

BENEFITS

  • A competitive salary and benefits package.
  • The opportunity to work with a modern, cutting-edge technology stack.
  • A key role in a fast-growing company at the intersection of finance and technology.
  • A collaborative and dynamic work environment with a strong focus on security and resilience.
  • Flexible working arrangements, full remote work opportunity.

If you are passionate about building resilient systems and are ready to take on the challenge of securing a next-generation crypto platform, we would love to hear from you. Please submit your resume and a cover letter detailing your relevant experience.


Apply To This Job

Similar Jobs

Aston Carter – Pre-Certification Representative – Springfield, MO

Remote, USA Full-time

Flight Attendant

Remote, USA Full-time

Remote Work for Data Entry Clerk (Part Time)

Remote, USA Full-time

Communication Strategist

Remote, USA Full-time

Remote Pharmacist (Prior Auth)

Remote, USA Full-time

Experienced Customer Service Representatives for Remote Work Opportunities Ideal for College Students in Illinois, Iowa, or Wisconsin

Remote, USA Full-time

[Hiring] Administrative Assistant Senior Representative @The Cigna Group

Remote, USA Full-time

Customer Service Representative at Staples, Inc.

Remote, USA Full-time

Nurse Navigator

Remote, USA Full-time

Client Service Consultant - Cigna Healthcare - Hybrid

Remote, USA Full-time

Staff Software Engineer, Economic Payments

Remote, USA Full-time

Experienced Part-Time Online Customer Service Representative – Remote Work Opportunity with Flexible Schedule and Competitive Benefits at blithequark

Remote, USA Full-time

**Part-time Customer Support Representative – Chat Specialist at blithequark**

Remote, USA Full-time

Experienced START Program Technician – Electric Vehicle Service and Maintenance Expert for a Sustainable Future

Remote, USA Full-time

Software QA Manager - REMOTE

Remote, USA Full-time

Staff Assistant (9- Month Roster)

Remote, USA Full-time

Field Engineer - Boston/New England

Remote, USA Full-time

Copywriter | Junior | Tesonet Accelerator product

Remote, USA Full-time

**Experienced Full Stack Data Entry Specialist – Airlines Operations and Customer Service**

Remote, USA Full-time

Experienced Customer Service Representative - Overnight Shift - Remote Work - Financial Services Industry - blithequark

Remote, USA Full-time