3 Site Reliability Engineer jobs in the Philippines

Site Reliability Engineer

National Capital Region, National Capital Region Nityo Infotech Services Philippines Inc.

Posted 24 days ago

Job Viewed

Tap Again To Close

Job Description

▪ Proven experience (3+ years) as a DevOps Engineer or a related role, with strong technical skills.
▪ In-depth knowledge of CI/CD pipelines, automation tools, and infrastructure management. br>▪ Proficiency in scripting languages (e.g., Python, Bash) and automation frameworks. < r>▪ Familiarity with cloud computing platforms (e.g., AWS, Azure, Google Cloud). < r>▪ Strong problem-solving and analytical abilities. < r>▪ Detail-oriented and proactive mindset. < r>▪ Effective communication and collaboration skills
This advertiser has chosen not to accept applicants from your region.

Senior Site Reliability Engineer

Mandaluyong City, National Capital Region Penbrothers

Posted 14 days ago

Job Viewed

Tap Again To Close

Job Description

About Penbrothers

Penbrothers is an HR & remote talent management partner and one of the fastest-growing companies in the Philippines. We provide talented Filipinos with global opportunities in high-growth startups and dynamic companies, from the comfort of their own homes.

About the Client

The client is a pioneer in medical recruitment, is seeking an experienced Tech Lead to drive their mission to enhance doctors' well-being. This is an opportunity to contribute your unique skills and expertise to create technology that truly matters, impacting lives on a daily basis

About the Role

We are looking for a Senior SRE/DevOps Specialist to play a vital role in ensuring the reliability of our Salesforce and web/mobile application environments. You will work closely with our engineers to continually improve and enhance our platform leaning towards world class best practices. 

Service reliability and observability

  • Analysing resource utilization and forecasting capacity needs to ensure the system can handle expected traffic and workloads without performance issues.

  • Writing code and scripts to automate repetitive operational tasks, configuration management, and deployment processes to reduce human error and increase efficiency.

  • Managing changes to production systems and services, ensuring that new releases and configuration changes are rolled out with minimal disruption and risk.

  • Identifying and addressing performance bottlenecks, optimizing software and infrastructure to improve response times and reduce resource consumption.

  • Maintaining thorough documentation of systems, configurations, and incident response procedures to facilitate knowledge sharing and onboarding of new team members.

  • Defining and maintaining service level objectives that specify the acceptable level of service quality, such as uptime and latency, for a particular system or service.

  • Defining the key performance metrics and indicators that will be used to measure the system's performance and reliability, such as error rates and response times.

  • Designing and implementing monitoring systems to track the SLIs and using alerting mechanisms to notify the team when the system deviates from its defined SLOs.

Incident management & Disaster recovery planning

  • Responding to and mitigating incidents that impact service availability or performance,

  • following an incident management process, and conducting post-incident reviews to learn and improve.

  • Planning and implementing and executing disaster recovery and backup strategies to ensure data and service availability in case of failures or disasters.

Security

  • Ensure systems and infrastructure are securely configured and hardened by default

  • Manage secrets, credentials, and access controls across environments

  • Monitor for security-related events and support incident response efforts

  • Maintain secure CI/CD pipelines and enforce safe deployment practices

  • Planning and implementing disaster recovery and backup strategies to ensure data and service availability in case of failures or disasters.

Continuous Improvement

  • Continuously evaluating and improving system reliability, efficiency, cost optimization and automation to meet our evolving business needs and customer expectations.

  • Rationalizing, evaluating and integrating 3rd party developer tooling and services.

  • Troubleshooting platform issues with development teams

  • Providing tooling support and access management for development teams

  • Stay ahead of the tech curve, bringing new tools and frameworks to the table

This advertiser has chosen not to accept applicants from your region.

Site Reliability Engineer (Supply Chain IT Operations)

Procter & Gamble

Posted 22 days ago

Job Viewed

Tap Again To Close

Job Description

Job Location
Taguig City
Job Description
Information Technology (IT) at Procter & Gamble is where business, innovation and technology integrate to build a competitive advantage for P&G. Our mission is clear -- you deliver IT to help P&G win with consumers.
Do you love implementing continuous improvement in IT solutions to drive efficiency and agility in meeting constantly evolving business needs? Then this job might be for you!
As a Site Reliability Engineer, you will be instrumental in ensuring the high availability and reliability of our digital IT products in the P&G supply chain. Your primary focus will be on enhancing system performance through faster detection, response, and resolution of issues, while also implementing strategies to prevent recurrence and reduce operational toil. You will use robust Observability and Monitoring tools, automate incident response systems, and optimize IT architecture to create a resilient and reliable infrastructure.
Responsibilities:
+ Implement and lead comprehensive monitoring solutions and tools to provide real-time insights into system performance, enabling proactive incident detection and ensuring accurate, actionable alerts for prompt responses.
+ Continuously refine monitoring strategies and develop automation scripts to address recurring issues, enhancing system visibility, resource optimization, and overall efficiency.
+ Establish and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to improve service quality and reliability,
+ Collect and share data and insights from observability tools to drive continuous improvement initiatives.
+ Work closely with Software Engineers, Product Teams, and Infrastructure Teams to develop and implement initiatives that enhance IT reliability.
+ Engage with customers to understand their needs and difficulties regarding Observability and Monitoring tools, providing exceptional support in all interactions, including communications, updates, and feedback.
+ Stay updated on industry trends and effective strategies in Site Reliability Engineering while continuously enhancing technical skills in system architecture, automation, cloud technologies, and operational processes.
+ Share knowledge and mentor team members to foster a culture of learning and professional development within the team
+ Lead root cause analysis efforts and implement corrective action plans in a timely manner to achieve permanent resolutions for incidents.
+ Oversee documentation and knowledge management efforts.
Job Qualifications
Candidates must demonstrate strong leadership in the application of technical expertise to drive business results.
We are looking for candidates who possess the following core qualities:
+ A Bachelor's degree in related field such as Engineering, Information Technology and Computer Science discipline.
+ Up to 5 years of relevant experience .
+ Experience or familiarity with monitoring and observability tools (e.g., Prometheus, preferably Grafana)
+ Knowledge and familiarity in system administration, including Linux/Unix environments, cloud platforms (Azure is preferred, but AWS or GCP are acceptable)
+ Experience with configuration management tools and infrastructure-as-code frameworks (e.g., Terraform)
+ Proficiency in at least one programming language (e.g., Python, C#) and a background in scripting for automation tasks
+ Understanding of networking protocols, network infrastructures, load balancing, and DNS management
+ Familiarity with containerization and Orchestration Technologies (e.g., Docker, Kubernetes)
+ Familiarity with databases and proficiency in writing SQL queries
+ Understanding of best practices in security and experience with implementing secure systems
+ Knowledge of incident response methodologies, root cause analysis, and implementing preventive measures (ITIL and/or SRE)
+ Familiarity with ticketing systems and task management (preferably ServiceNow)
+ Problem-solving skills with ability to analyze complex issues and devise effective solutions
+ Learning agility as there will be new topics to learn and new spaces to understand
+ Communication and collaboration skills to work effectively with multi-functional teams, partners, and customers
+ Teamwork and interpersonal skills, with an ability to build relationships and work effectively in a collaborative environment
+ Operational excellence / execution skills as the work requires discipline
Preferred Skills:
+ Understanding or experience in Supply Chain applications and processes, documents or general data flow to understand impact of unplanned IT downtimes and impact of IT changes to business operations
About us
We produce globally recognized brands, and we grow the best business leaders in the industry. With a portfolio of trusted brands as diverse as ours, it is paramount our leaders are able to lead with courage the vast array of brands, categories and functions. We serve consumers around the world with one of the strongest portfolios of trusted, quality, leadership brands, including Always®, Ariel®, Gillette®, Head & Shoulders®, Herbal Essences®, Oral-B®, Pampers®, Pantene®, Tampax® and more. Our community includes operations in approximately 70 countries worldwide.
Visit to know more.
We are an equal opportunity employer and value diversity at our company. We do not discriminate against individuals on the basis of race, color, gender, age, national origin, religion, sexual orientation, gender identity or expression, marital status, citizenship, disability, HIV/AIDS status, or any other legally protected factor.
Job Schedule
Full time
Job Number
R000136489
Job Segmentation
Experienced Professionals (Job Segmentation)
This advertiser has chosen not to accept applicants from your region.
Be The First To Know

About the latest Site reliability engineer Jobs in Philippines !

 

Nearby Locations

Other Jobs Near Me

Industry

  1. request_quote Accounting
  2. work Administrative
  3. eco Agriculture Forestry
  4. smart_toy AI & Emerging Technologies
  5. school Apprenticeships & Trainee
  6. apartment Architecture
  7. palette Arts & Entertainment
  8. directions_car Automotive
  9. flight_takeoff Aviation
  10. account_balance Banking & Finance
  11. local_florist Beauty & Wellness
  12. restaurant Catering
  13. volunteer_activism Charity & Voluntary
  14. science Chemical Engineering
  15. child_friendly Childcare
  16. foundation Civil Engineering
  17. clean_hands Cleaning & Sanitation
  18. diversity_3 Community & Social Care
  19. construction Construction
  20. brush Creative & Digital
  21. currency_bitcoin Crypto & Blockchain
  22. support_agent Customer Service & Helpdesk
  23. medical_services Dental
  24. medical_services Driving & Transport
  25. medical_services E Commerce & Social Media
  26. school Education & Teaching
  27. electrical_services Electrical Engineering
  28. bolt Energy
  29. local_mall Fmcg
  30. gavel Government & Non Profit
  31. emoji_events Graduate
  32. health_and_safety Healthcare
  33. beach_access Hospitality & Tourism
  34. groups Human Resources
  35. precision_manufacturing Industrial Engineering
  36. security Information Security
  37. handyman Installation & Maintenance
  38. policy Insurance
  39. code IT & Software
  40. gavel Legal
  41. sports_soccer Leisure & Sports
  42. inventory_2 Logistics & Warehousing
  43. supervisor_account Management
  44. supervisor_account Management Consultancy
  45. supervisor_account Manufacturing & Production
  46. campaign Marketing
  47. build Mechanical Engineering
  48. perm_media Media & PR
  49. local_hospital Medical
  50. local_hospital Military & Public Safety
  51. local_hospital Mining
  52. medical_services Nursing
  53. local_gas_station Oil & Gas
  54. biotech Pharmaceutical
  55. checklist_rtl Project Management
  56. shopping_bag Purchasing
  57. home_work Real Estate
  58. person_search Recruitment Consultancy
  59. store Retail
  60. point_of_sale Sales
  61. science Scientific Research & Development
  62. wifi Telecoms
  63. psychology Therapy
  64. pets Veterinary
View All Site Reliability Engineer Jobs