704 Devops Engineers jobs in the Philippines

DevOps Engineers

₱900000 - ₱1200000 Y Staff4Me

Posted today

Job Viewed

Tap Again To Close

Job Description

Staff4Me is on the lookout for talented DevOps Engineers to join our expanding team. In this role, you will help bridge the gap between development and operations while streamlining our processes and improving system performance. Our ideal candidate thrives in a fast-paced environment, has a deep understanding of DevOps practices, and is passionate about automation and efficiency.

  • System Design: Collaborate with cross-functional teams to design scalable and reliable systems that meet business requirements.
  • Automation: Develop and implement automation tools to enhance deployment processes and system management.
  • Continuous Integration/Delivery: Set up and manage CI/CD pipelines to facilitate rapid development and deployment.
  • Monitoring & Logging: Implement monitoring and logging solutions to ensure high availability and performance.
  • Infrastructure Management: Utilize infrastructure as code tools (like Terraform or CloudFormation) to provision and manage infrastructure.
  • Security Best Practices: Integrate security measures across the development lifecycle to ensure compliant and secure systems.
  • Collaboration: Work closely with development teams to promote best practices and improve workflows.

Requirements:

  • Proven experience as a DevOps Engineer or similar role.
  • Strong knowledge of containerization tools such as Docker and orchestration with Kubernetes.
  • Experience with cloud platforms like AWS, Azure, or Google Cloud.
  • Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI/CD).
  • Proficient in at least one scripting language (Python, Bash, etc.).
  • Solid understanding of networking principles and monitoring solutions.
  • Excellent problem-solving skills and ability to work under pressure.

Technical Skills:

  • Experience with automation and configuration management tools like Ansible or Chef.
  • Knowledge of monitoring tools (Prometheus, Grafana, ELK stack).
  • Understanding of microservices architecture.
  • Familiarity with version control systems, particularly Git.
  • Experience with performance tuning and troubleshooting applications.

Soft Skills:

  • Strong communication and collaboration skills.
  • Ability to manage multiple priorities and meet deadlines.
  • A proactive attitude and eagerness to learn new technologies.
  • Attention to detail and a commitment to quality.

Benefits:

  • Competitive salary and performance-based bonuses
  • Health, dental, and vision insurance
  • Flexible working hours and remote work options
  • Opportunities for professional development and training
  • Collaborative and inclusive work environment
This advertiser has chosen not to accept applicants from your region.

Cloud Site Reliability Engineer

₱80000 - ₱150000 Y Tyler Technologies

Posted today

Job Viewed

Tap Again To Close

Job Description

Description
Responsibilities

  • Implement tooling to monitor AWS EKS-based systems focusing on performance, reliability, and scalability.
  • Ensure that architecture and deployment models are sufficient to support SLA commitments and are well prepared for future problems of scale.
  • Leverage cloud technology and platform capabilities to provide operationally sustainable solutions that are robust and cost effective.
  • Apply software engineering best practices to comprehensively address and resolve problems.
  • Collaborate with product support teams to drive efficiency and enhance customer experience through self-service tools and automation.
  • Ensure timely response to incidents and support requests, collaborating effectively on solutions.
  • Conduct root cause analysis and implement preventative measures to minimize toil and impact on customers.
  • Lead and participate in incident retrospectives to enhance future response efforts.
  • Participate in on-call rotations, providing critical support as needed.

Qualifications

  • 3+ years of a successful software engineering or technical operations career within reputable technology firms, particularly with large-scale cloud applications.
  • Expertise in Site Reliability Engineering concepts and practices, including the use of observability platforms and monitoring tools, such as Datadog.
  • Experience deploying and supporting containerized applications on cloud platforms, preferably EKS on AWS.
  • Proficiency in infrastructure as code technologies, such as Terraform.
  • Strong software engineering skills in languages like Python, JavaScript, or Go.
  • Familiarity with DevOps and CI/CD methodologies.
  • Bachelor's degree in Computer Science or related field.

In addition to the required technical skills and qualifications, our ideal candidate will:

  • Be a collaborative and versatile team player, effective across various functions.
  • Demonstrate independence and accountability with a proactive approach to ownership.
  • Have a strong desire to contribute and thrive in a cohesive technology team environment.
  • Have practical knowledge of Tyler Tech's primary tech stack: AWS, Kubernetes, Argo CD, RDS for Postgres and Microsoft SQL, Terraform, C#/.NET Core, GitHub Actions, Docker, Datadog, PagerDuty.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Taguig, National Capital Region ₱1200000 - ₱2400000 Y Procter & Gamble

Posted today

Job Viewed

Tap Again To Close

Job Description

Job Location
Taguig City

Job Description
Information Technology (IT) at Procter & Gamble is where business, innovation and technology integrate to build a competitive advantage for P&G. Our mission is clear -- you deliver IT to help P&G win with consumers.

Do you love implementing continuous improvement in IT solutions to drive efficiency and agility in meeting constantly evolving business needs? Then this job might be for you

As a Site Reliability Engineer, you will be instrumental in ensuring the high availability and reliability of our digital IT products in the P&G supply chain. Your primary focus will be on enhancing system performance through faster detection, response, and resolution of issues, while also implementing strategies to prevent recurrence and reduce operational toil. You will use robust Observability and Monitoring tools, automate incident response systems, and optimize IT architecture to create a resilient and reliable infrastructure.

Responsibilities:

  • Implement and lead comprehensive monitoring solutions and tools to provide real-time insights into system performance, enabling proactive incident detection and ensuring accurate, actionable alerts for prompt responses.
  • Continuously refine monitoring strategies and develop automation scripts to address recurring issues, enhancing system visibility, resource optimization, and overall efficiency.
  • Establish and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to improve service quality and reliability,
  • Collect and share data and insights from observability tools to drive continuous improvement initiatives.
  • Work closely with Software Engineers, Product Teams, and Infrastructure Teams to develop and implement initiatives that enhance IT reliability.
  • Engage with customers to understand their needs and difficulties regarding Observability and Monitoring tools, providing exceptional support in all interactions, including communications, updates, and feedback.
  • Stay updated on industry trends and effective strategies in Site Reliability Engineering while continuously enhancing technical skills in system architecture, automation, cloud technologies, and operational processes.
  • Share knowledge and mentor team members to foster a culture of learning and professional development within the team
  • Lead root cause analysis efforts and implement corrective action plans in a timely manner to achieve permanent resolutions for incidents.
  • Oversee documentation and knowledge management efforts.

Job Qualifications
Candidates must demonstrate strong leadership in the application of technical expertise to drive business results.

We are looking for candidates who possess the following core qualities:

  • A Bachelor's degree in related field such as Engineering, Information Technology and Computer Science discipline.
  • Up to 5 years of relevant experience .
  • Experience or familiarity with monitoring and observability tools (e.g., Prometheus, preferably Grafana)
  • Knowledge and familiarity in system administration, including Linux/Unix environments, cloud platforms (Azure is preferred, but AWS or GCP are acceptable)
  • Experience with configuration management tools and infrastructure-as-code frameworks (e.g., Terraform)
  • Proficiency in at least one programming language (e.g., Python, C#) and a background in scripting for automation tasks
  • Understanding of networking protocols, network infrastructures, load balancing, and DNS management
  • Familiarity with containerization and Orchestration Technologies (e.g., Docker, Kubernetes)
  • Familiarity with databases and proficiency in writing SQL queries
  • Understanding of best practices in security and experience with implementing secure systems
  • Knowledge of incident response methodologies, root cause analysis, and implementing preventive measures (ITIL and/or SRE)
  • Familiarity with ticketing systems and task management (preferably ServiceNow)
  • Problem-solving skills with ability to analyze complex issues and devise effective solutions
  • Learning agility as there will be new topics to learn and new spaces to understand
  • Communication and collaboration skills to work effectively with multi-functional teams, partners, and customers
  • Teamwork and interpersonal skills, with an ability to build relationships and work effectively in a collaborative environment
  • Operational excellence / execution skills as the work requires discipline

Preferred Skills:

  • Understanding or experience in Supply Chain applications and processes, documents or general data flow to understand impact of unplanned IT downtimes and impact of IT changes to business operations

About Us
We produce globally recognized brands, and we grow the best business leaders in the industry. With a portfolio of trusted brands as diverse as ours, it is paramount our leaders are able to lead with courage the vast array of brands, categories and functions. We serve consumers around the world with one of the strongest portfolios of trusted, quality, leadership brands, including Always, Ariel, Gillette, Head & Shoulders, Herbal Essences, Oral-B, Pampers, Pantene, Tampax and more. Our community includes operations in approximately 70 countries worldwide.

Visit to know more.

We are an equal opportunity employer and value diversity at our company. We do not discriminate against individuals on the basis of race, color, gender, age, national origin, religion, sexual orientation, gender identity or expression, marital status, citizenship, disability, HIV/AIDS status, or any other legally protected factor.

Job Schedule
Full time

Job Number
R

Job Segmentation
Experienced Professionals (Job Segmentation)

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Makati City, National Capital Region ₱104000 - ₱130878 Y Drake International Philippines

Posted today

Job Viewed

Tap Again To Close

Job Description

Drake International Philippines is actively hiring for an
IT Observability Engineer / Site Reliability Engineer
that is eager to boost their growing career upward

ABOUT THE ROLE:

Job Title: IT Observability Engineer / Site Reliability Engineer

Employment Type:
6-month contract (Renewable)

Work Set-up and location: Onsite, Makati

Work Schedule:
Mondays to Fridays

Here's what we're looking for an IT Observability Engineer / Site Reliability Engineer:

  • Must have
    4+ years of experience
    in IT operations or a similar role, with a solid foundation in monitoring and observability principles,
  • Must be proficient in
    Prometheus, Grafana, Splunk, Jaeger
    , or similar industry-leading tools,
  • Must have solid experience with cloud-based observability platforms like
    AWS CloudWatch
    or
    Azure Monitor,
  • Must have knowledge of security monitoring tools and incident response best practices and experience with incident response methodologies and best practices, and;
  • Must have experience in
    scripting and automation
    , with proficiency in languages like Python, Bash, or Go for data manipulation and automation tasks.

Do you think you're a perfect fit for the job? Wait no more and let Drake help you with your career as an
IT Observability Engineer / Site Reliability Engineer
Apply now

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Makati City, National Capital Region ₱720000 - ₱972000 Y Cambridge University Press & Assessment | Manila

Posted today

Job Viewed

Tap Again To Close

Job Description

NOTE: When you click the apply button, you will be re-directed to Cambridge University Press & Assessment's website where you will be required to create a profile and upload a copy of your CV to complete your application.

ork setup: Hybrid (open to 2x a week in the office)

Work schedule: 10AM to 6PM Manila time

Employment type: Permanent

Location: Makati City, Metro Manila

Pay range: Php 60,000 to Php 81,000

We value transparency and encourage applicants comfortable with this range to apply.

Discover a world of endless possibilities with Cambridge University Press & Assessment, a distinguished global academic publisher and assessment organization proudly affiliated with the prestigious University of Cambridge.

We are recruiting for a Site Reliability Engineers who will be part of our SRE function within the Platform Operations Team.  This is a new team of engineers who will work alongside English Technologies existing Platform Support and Engineering teams.

Why Cambridge?

Cambridge University Press & Assessment is a world-renowned not-for-profit academic publisher and assessment organisation, proudly part of the prestigious University of Cambridge. With a legacy rooted in over 800 years of educational excellence, we are dedicated to unlocking the potential of learners and educators across the globe.

Joining Cambridge's second largest global office in the Philippines —operating for over 22 years with 1,300+ colleagues— means becoming a part of an extraordinary institution renowned worldwide. We are recognised as a Great Place to Work for three consecutive years, reflecting our inclusive culture, strong sense of purpose, and commitment to the professional growth and well-being of our people. At Cambridge, we don't just publish books or deliver tests—we empower progress, inspire curiosity, and champion the pursuit of knowledge.

What can you get from Cambridge?

At Cambridge, you'll become a part of a vibrant and forward-thinking community that transcends tradition, fostering a culture of continuous growth and personal development. Here, we provide the right environment for you to thrive, supporting your professional journey and empowering you to reach your highest potential, that is why our pay philosophy is intricately tied to your skills and competencies, ensuring that your compensation aligns with the unique value you bring to the role you are applying for.

The organization offers a wide range of benefits and opportunities including:

  • Regular Employment on Day 1
  • HMO Coverage and Life Insurance on Day 1
  • Paid Annual Leaves (Vacation, Well-being, Flexible, Holiday, and Volunteering leaves)
  • Vesting/Retirement package
  • Opportunities for career growth and development
  • Access to well-being programs
  • Flexible schedule, hybrid work arrangement and work-life balance
  • Opportunity to collaborate with colleagues from diverse branches that will expand your horizons and enrich your understanding of different cultures

What will you do as a Site Reliability Engineer?

  • The Site Reliability Engineer will join a new SRE function within the Platform Operations Team working alongside existing Platform Support and Engineering teams.
  • The role will be responsible for support and design aspects of the English Engineers ecosystem (Platforms, Applications, Services and Websites).
  • Responsible for creating and maintaining software and processes that ensure the reliability and availability of the English digital platforms/websites and their software delivery pipelines.

Please review the attached job description for further details on the role.

What makes you the ideal candidate for this role?

  • Education & Experience: Degree or equivalent experience with at least 3 years in AWS Cloud Engineering, Architecture, or Infrastructure, combined with 3+ years in a Systems Admin or DevOps role.
  • DevOps & Delivery Model: Experience with DevOps delivery for infrastructure, applications, and configuration, including Infrastructure as Code (Terraform, CDK), CI/CD (GitHub Actions, Bitbucket Pipelines), and containerization/orchestration (Docker, Kubernetes).
  • Monitoring & Logging: Expertise with central logging systems (ELK/EFK stack), monitoring tools (New Relic, Datadog, Grafana, Alert Manager, PagerDuty, site24x7), and troubleshooting production issues in cloud environments.

  • Cloud Infrastructure: Deep knowledge of AWS services such as Fargate, Route53, CloudWatch, API Gateway, Lambda, CodePipeline, CloudFormation, DynamoDB, and networking.

  • Application & Database: Breadth of experience across Elasticsearch, MySQL, PostgreSQL, Java, , Git/GitHub, and Confluent/Kafka.
  • Technical Skills: Strong troubleshooting, debugging, documentation, and communication abilities.

  • Ways of Working: Experience working in Agile product development environments and collaborating with global teams across cultures.

Are you driven by desire to be part of a globally renowned institution that celebrates innovation, embraces inclusion, and empowers learners? Then, we invite you to Pursue your Potential with us.

Applications received through the system will be reviewed on a rolling basis and may close the vacancy once sufficient applications are received. Therefore, if you are interested, tailor-fit your CV (advantageous if you submit one with a Cover Letter) and submit as early as possible

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Pasig City, National Capital Region ₱1200000 - ₱2400000 Y Seven Seven Global Services, Inc.

Posted today

Job Viewed

Tap Again To Close

Job Description

Work Location: Ortigas, Pasig City

Shift Schedule: Day Shift

Work Setup: Hybrid (3-4x a week onsite)

Job Description:

  • Handle service monitoring, incident response, and drive technical support efficiency
  • Responsible for managing and maintaining network monitoring tools, systems, and processes that ensure the availability, scalability, and performance of our production environments.
  • Responsible for incident handling, service monitoring, and technical support efficiency.
  • Closely work with developers, DevOps, infrastructure teams, and different stakeholders to achieve proactive incident prevention, issue resolution and incident documentations.

Key Responsibilities:

  • Ensure that all tickets are updated and handled based on set KPI's and SLA's
  • Manage monitoring, alerting, and logging tools to ensure system health and service uptime.
  • Ensure early detection, triage and escalation of service degradation based on defined service level agreement
  • Trigger L2 ticket handling and on-call rotations for critical incidents.
  • Execute triage, diagnosis, and resolution of incidents required for L3 escalations, both internal and 3rd party support teams
  • Support major incident response, contribute to root cause analysis (RCA), and help document postmortems.
  • Track, analyze, and act on incident trends and recurring technical issues.
  • Use data from ticketing systems (Jira, ServiceNow, etc.) to improve team responsiveness and resolution quality.
  • Update and maintain SOPs, runbooks, and knowledge base articles including the documentation of known issues, fixes, and playbooks to improve mean time to resolution.
  • Collaborate with development and QA teams to improve deployment readiness and reliability
  • Participate in technical competency mapping to ensure coverage and reduce unnecessary escalations.

Skills and Competencies:

  • Hands-on experience with ITSM platforms (e.g., ServiceNow, Jira Service Management).
  • Familiarity with ITIL principles and ITSM process areas (incident, problem, request, change, asset, and service catalog management).
  • Basic knowledge of IT infrastructure components (networks, servers, applications) and how they support IT services.
  • Experience in monitoring system performance and escalating outages or performance degradation.
  • Ability to troubleshoot and document IT issues effectively for escalation and closure.
  • Strong attention to detail in documentation, ticket updates, and asset records.
  • Familiarity with regulatory and compliance frameworks (e.g., BSP, PDIC, ISO 27001, COBIT) is a plus.
  • Clear written and verbal communication skills for ticket handling and team collaboration.
  • Proactive, detail-oriented, and able to manage multiple tasks in a structured IT operations environment.

Qualifications and Experience:

  • Bachelor's degree in Electronics Engineering, Information Technology, Computer Science, Management Information Systems, or equivalent.
  • 2–5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
  • Hands-on experience with monitoring tools (e.g., Prometheus, Grafana, ELK, or Datadog).
  • Familiarity with incident response and troubleshooting in production systems.
  • Experience with at least one cloud platform (AWS, GCP, or Azure).
  • Knowledgeable in scripting (e.g., Python, Bash) and Linux systems.
  • Exposure to ITIL-based processes, especially Incident and Problem Management.
  • Experience working in fintech, banking, or SaaS with high availability SLAs.
  • Familiarity with DevOps practices, CI/CD pipelines, and cloud-based monitoring tools.
  • Experience with automation platforms
  • Knowledge of BSP regulatory frameworks, policies, and guidelines.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Taguig, National Capital Region ₱1200000 - ₱3600000 Y Philtech

Posted today

Job Viewed

Tap Again To Close

Job Description

Responsibilities:

  1. Software Maintenance and Support:

  2. Monitor software performance and suggest improvements.

  3. Ensure software systems are secure and compliant with industry standards.
  4. Conduct code reviews and provide feedback to developers.
  5. Develop and maintain automated testing scripts.
  6. Incident Management:

  7. Respond promptly to alerts, incidents, troubleshoot issues, and restore services during production emergencies.

  8. Collaborate with stakeholders to resolve complex problems.
  9. Automation:

  10. Identify opportunities for automation and implement solutions.

  11. Streamline repetitive tasks to improve efficiency.
  12. Collaboration and Documentation:

  13. Work closely with the development team to address issues and implement enhancements.

  14. Participate in code reviews to improve coding skills.
  15. Write detailed technical and user documentation.
  16. Testing:

  17. Conduct unit and regression tests to validate application functionality.

  18. Perform user-acceptance tests before production deployment.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field.
  • Proficiency in multiple programming languages such as Java and/or Python
  • Knowledge of the following technologies:

  • Agile methodology and tools

  • Different operating systems (Windows, Unix/Linux).
  • Databases and SQL such as MongoDB and/or Snowflake.
  • Monitoring platform such as Grafana, Microsoft Log Analytics, others
  • Micro Services and Containers
  • Databricks
  • Google Cloud Platform or Microsoft Azure
  • Open-Source Frameworks (e.g., Struts, Spring, Hibernate) is a plus
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Devops engineers Jobs in Philippines !

Site Reliability Engineer

₱30000 - ₱150000 Y Braintrust

Posted today

Job Viewed

Tap Again To Close

Job Description

Job Description
*Compensation range varies off level of experience: *
Jr SRE $12k-$8k/yr, Intermediate: 20k- 30k/yr, Senior: 35k - 50k/yr

Some travel may be required.

*Card payment domain knowledge/experience is key: *
Our client, a global Business Process Outsourcing (BPO) businesses is looking for Site Reliability Engineers (SRE) to support their client, a global payment technology company that provides platforms to consumers, businesses and organizations to make electronic payments. The successful candidate will be responsible for ensuring site reliability & performance, monitoring & alerting, and supporting emergency response situations. This would require working closely with software engineers, DevOps and product teams to maintain robust infrastructure and automation that supports mission-critical applications.

The ideal candidate creates a bridge between development and operations by applying a software engineering mindset to service management. We are seeking an individual who is highly motivated, intellectually curious, and seeks out opportunities for improvement.

*The Role: *
This role involves working with a team of talented SREs/DevOps Engineers to support highly scalable services. Responsibilities include:

  • Responsible for pipeline build and maintenance in accordance with

the clients tooling and conventions.

  • Participate in the software development lifecycle, working closely with the

development team to ensure that designed solutions meet non-functional

requirements such as availability, performance, security and

maintainability standards.

  • Maintain services through monitoring of metrics, system health, and

analysis of reports.

  • Provide support for production and in-house systems. Participate in on-

call Production support rota.

  • Incident management, on call support and root cause analysis conducting post incident reviews and 5-Whys analysis.
  • Remediate system vulnerability , security and resiliency measures.
  • Improve process and systems within the Program.
  • Lead incident management efforts by proactively monitoring and analyzing ISO 8583 financial transaction messages across the 4-party payment model (Cardholder, Merchant, Acquirer, Issuer).

*Skills & requirements:
MIN 2+ years of experience *

  • Card payment domain knowledge (mandatory)
  • Experience with CI/CD and Build pipelines using Jenkins.
  • Experience in public and private Cloud offerings (PCF, Azure, AWS etc.).
  • Knowledge of NoSQL & SQL databases such as Mongo / Oracle/

Postgres.

  • Experience and knowledge of managing distributed systems and working

with microservices.

  • Familiarity with Unix tooling, with strong scripting skills
  • Exposure to working with Monitoring and Alerting tools such as Splunk,

Dynatrace

  • Proficiency in one of the following: Python, Java, GO or equivalent.
  • Familiarity defining SLO's and SLA's
  • Prior experience of working in an SRE/DevOps team and excellent understanding of SRE/DevOps principles.
  • High degree of initiative and self-motivation, with a willingness to take on

challenging opportunities.

  • Excellent communication and relationship building/collaboration skills.
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

Pasig City, National Capital Region ₱1200000 - ₱2400000 Y Seven Seven

Posted today

Job Viewed

Tap Again To Close

Job Description

  • Handle service monitoring, incident response, and drive technical support efficiency.
  • Responsible for managing and maintaining network monitoring tools, systems, and processes that ensure the availability, scalability, and performance of our production environments.
  • Responsible for incident handling, service monitoring, and technical support efficiency.
  • Closely work with developers, DevOps, infrastructure teams, and different stakeholders to achieve proactive incident prevention, issue resolution and incident documentations.

Key Responsibilities:

  • Ensure that all tickets are updated and handled based on set KPI's and SLA's
  • Manage monitoring, alerting, and logging tools to ensure system health and service uptime.
  • Ensure early detection, triage and escalation of service degradation based on defined service level agreement
  • Trigger L2 ticket handling and on-call rotations for critical incidents.
  • Execute triage, diagnosis, and resolution of incidents required for L3 escalations, both internal and 3rd party support teams
  • Support major incident response, contribute to root cause analysis (RCA), and help document postmortems.
  • Track, analyze, and act on incident trends and recurring technical issues.
  • Use data from ticketing systems (Jira, ServiceNow, etc.) to improve team responsiveness and resolution quality.
  • Update and maintain SOPs, runbooks, and knowledge base articles including the documentation of known issues, fixes, and playbooks to improve mean time to resolution.
  • Collaborate with development and QA teams to improve deployment readiness and reliability
  • Participate in technical competency mapping to ensure coverage and reduce unnecessary escalations.

Skills and Competencies:

  • Hands-on experience with ITSM platforms (e.g., ServiceNow, Jira Service Management).
  • Familiarity with ITIL principles and ITSM process areas (incident, problem, request, change, asset, and service catalog management).
  • Basic knowledge of IT infrastructure components (networks, servers, applications) and how they support IT services.
  • Experience in monitoring system performance and escalating outages or performance degradation.
  • Ability to troubleshoot and document IT issues effectively for escalation and closure.
  • Strong attention to detail in documentation, ticket updates, and asset records.
  • Familiarity with regulatory and compliance frameworks (e.g., BSP, PDIC, ISO 27001, COBIT) is a plus.
  • Clear written and verbal communication skills for ticket handling and team collaboration.
  • Proactive, detail-oriented, and able to manage multiple tasks in a structured IT operations environment.

Qualifications and Experience

  • Bachelor's degree in Electronics Engineering, Information Technology, Computer Science, Management Information Systems, or equivalent.
  • 2–5 years of experience in Site Reliability Engineering, DevOps, or Infrastructure roles.
  • Hands-on experience with monitoring tools (e.g., Prometheus, Grafana, ELK, or Datadog).
  • Familiarity with incident response and troubleshooting in production systems.
  • Experience with at least one cloud platform (AWS, GCP, or Azure).
  • Knowledgeable in scripting (e.g., Python, Bash) and Linux systems.
  • Exposure to ITIL-based processes, especially Incident and Problem Management.
  • Experience working in fintech, banking, or SaaS with high availability SLAs.
  • Familiarity with DevOps practices, CI/CD pipelines, and cloud-based monitoring tools.
  • Experience with automation platforms
  • Knowledge of BSP regulatory frameworks, policies, and guidelines
This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer

₱800000 - ₱2500000 Y HGS Offshore Staffing Solutions (HGS OSS)

Posted today

Job Viewed

Tap Again To Close

Job Description

POSITION OVERVIEW

We are seeking an experienced Senior AWS Site Reliability Engineer to join our cross-functional

cloud platform team. Working alongside a diverse group of DevOps and Site Reliability

Engineers, you will combine deep technical expertise in AWS cloud infrastructure with strong

leadership capabilities in incident response and system reliability. In this role, you will be

instrumental in leading incident response, maintaining, optimising and scaling our cloud

infrastructure while ensuring exceptional system reliability and performance.

KEY RESPONSIBILITIES


• Lead incident response from initial detection, real-time mitigation, root cause analysis,

post-mortem documentation (using Incident IO) and implementation of lessons learned,

with a focus on continuous improvement.


• Develop and execute comprehensive incident response strategies to minimise

downtime and business impact


• Participate in a 24/7 on-call rotation to ensure continuous system availability


• Implement and maintain comprehensive observability solutions using Cloudwatch,

DataDog or similar monitoring platforms


• Maintain, improve, and optimise AWS infrastructure using Terraform while ensuring

scalability, reliability, and cost efficiency.


• Continuously assess and enhance AWS infrastructure to optimise performance and cost

effectiveness


• Monitor and optimise serverless technologies including AWS Lambda and API Gateway

for peak performance and cost efficiency


• Monitor and maintain ECS Fargate deployments for containerised applications, ensuring

optimal resource utilisation


• Collect and analyse metrics to identify resource consumption, abnormal behavior, and

potential performance bottlenecks


• Configure and manage alerting, dashboards, and automated monitoring across

distributed systems


• Foster improved collaboration between development and operations teams by

implementing SRE practices

REQUIRED QUALIFICATIONS


• Previous experience in a DevOps or SRE role


• Exceptional written and verbal communication skills


• Proven experience in incident response and 24/7 on-call responsibilities


• Expert-level knowledge of Infrastructure as Code, primarily Terraform (demonstrated

experience with other IaC tools will be highly regarded)


• Expert-level knowledge of AWS compute infrastructure


• Proficiency in automation tools and scripting languages


• Strong understanding of monitoring, metrics collection, and performance analysis


• Expert knowledge of observability and monitoring platforms such as DataDog, New

Relic, Prometheus, or similar tools


• Experience with log aggregation, APM (Application Performance Monitoring), and

distributed tracing


• Excellent collaboration abilities and capacity to work effectively in cross-functional

teams


• Strong analytical and problem-solving skills


• Demonstrated ability to work autonomously and take ownership

PREFERRED QUALIFICATIONS


• Experience with (highly desirable)


• Background in payments and PCI compliance environments (highly desirable)


• AWS certifications


• Experience with container orchestration and microservices architecture


• Knowledge of security best practices in cloud environments

This advertiser has chosen not to accept applicants from your region.
 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Devops Engineers Jobs