Principal Cloud Ops Engineer
Your First 90 Days
In your first 30 Days, as your familiarity with the product grows, your responsibilities and influence will grow as well. You, along with your team, will be responsible for supporting the product team’s operational needs in our upper environments. Further, you will collaborate with other members of the Development and QA team in established patterns and continue to hone your skills as you start to formulate ways to push the design, architecture and implementation of our CI pipelines (lower and upper environments) to their next phase.
Within your first 60 Days, you will fill in the gaps to have a well-tested, low-latency and highly available environment for our product operational needs. Working with the development team, you will start to implement out the gaps in creating and supporting a truly scalable product offering. You will be highly influential in the formation of the rest of our operations team as you help hire additional operations engineers. Your team will be responsible for supporting production environments.
Within your first 90 Days, you will help drive changes to the operational and development roadmap as we continue onboarding new and existing customers into our hosted production environments.
What You’ll Do
Design, provision, configure and maintain the operations platform to handle the scale of running several application stacks in the cloud that will be consumed by thousands of customers nationwide and our internal Product Team.
- Automate the deployment and maintenance of cloud platform technologies in both upper and lower environments
- Implement and oversee log management, data warehouse, and database operations, including management of Logging/Audit services
- Ensure all monitoring systems (infrastructure- and application-level) are in place; report on availability
- Design and implement strategies around disaster recovery and security for all sub-systems in infrastructure (e.g., web servers, database, queues, storage, network)
- Aid in improving the overall product through development task specific automation in lower pipeline
- Integrate static analysis tools in build pipeline (security, code quality, etc.)
- Add database deployment capability to release pipeline (automate schema changes across all databases)
- Incorporate test automation into build pipeline
- Separate code from configuration in build/release pipeline
- Research and implement emerging virtualization techniques and advise management around improved scalability
- Build strategic and tactical plans for continued improvement of cloud architecture and operations
- Perform capacity management, load and scalability planning
- Help drive process improvements for service management, including: outage/incident management, rollbacks and reporting
- Assist management in development and optimization of operational cost models
- Assist in the establishment of 24x7 performance monitoring, reporting and response protocols
With the help of your team and the development group, you will provide on-call support outside of normal work hours/days
Driven, humble, and autonomous
- A quick study, strong communicator, and you’re able to adapt to fast-paced environments
- Working knowledge of Agile Development practices (e.g., SCRUM, TDD)
- You are (or have the mindset of) a developer, but are intrigued by the operational aspects of hosting developed solutions
- You’re devoted to automation
- 4-8 years of hands-on production experience with Amazon Web Services (AWS), Google Cloud or Microsoft Azure, including:
- Configuration of VPCs, with VPN to corporate network
- Experience setting up, maintaining and monitoring global production environments, QA and staging environments, with a strong understanding of the differing needs of such environments
- 3 - 4 years of experience in a professional production environment
- 3 - 4 years of experience managing networking infrastructure and monitoring at the application level
- Performance optimization experience, including troubleshooting and resolving network and server latency issues, performing hardware evaluation/selection tasks, performance vs. cost vs. time analysis
- At least 1 year of experience with automation or scripting tools (e.g., GO, Python, Shell, PowerShell)
- 2 - 3 years of experience with Ansible, Jenkins or other comparable tools
- Detail-oriented, with excellent documentation skills, and able to successfully manage multiple priorities
Troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure
Other Things We Hope You Have
Bachelor’s Degree in Computer Science
- Experience with Relational Databases such as Oracle and Aurora, Splunk (or other log aggregation tools), Grafana, Terraform and Prometheus
- Extensive production experience with MS Azure
- Experience working with Docker, Kubernetes and hands-on experience with performance, load and security penetration testing
- Hands-on experience with building and maintaining a continuous integration and delivery pipeline
You will be an integral member of what will ultimately be a three-person team of Cloud Ops Engineers. You will report to our Director of Cloud Development, but will collaborate extensively with the Director of Development and the rest of our Development team.
We have an open and collaborative environment where everyone works together to deliver what is needed, from product features to operations needs (e.g., health checks).
We value open and direct communication, taking calculated risks that will push us forward, and investing in our people.
- Production and Continuous Integration footprints in Azure and AWS
- Front-end applications leverage .Net, Vue.js, React and Java
- APIs comprise of .NET and Java
- Backend comprises of MS SQL Server, Oracle and AWS Aurora
We currently have a CI pipeline that we are looking to take to the next level to help with our growth in customers and employee base
We are a Team. Employees, customers, and partners working together.
- We are Customer-Focused. Customers are the heart of everything we do.
- We are Driven. Seeking exceptional outcomes.
- We Own our Success. Every employee has a stake in our company.
- We do the right thing and have fun in the process.