Network Engineer 3
Create a free account to apply in seconds
Network Engineer 3
Singapore
Job Description
What You'll Do
As a Principal Network Development Engineer within Oracle Cloud Infrastructure (OCI) Network Reliability Engineering (NRE), you will help ensure the reliability, scalability, and operational excellence of one of the world's largest cloud networks. You will work on complex network infrastructure challenges, develop automation to improve operational efficiency, and collaborate across engineering and operations teams to deliver highly available cloud services.
Key Responsibilities
- Maintain and improve the availability, performance, and reliability of OCI network services.
- Diagnose, troubleshoot, and resolve complex network incidents across large-scale cloud environments.
- Design and develop automation and tooling to eliminate repetitive operational tasks and prevent recurring issues.
- Review and contribute to network architecture, service design, and lifecycle management initiatives.
- Participate in network deployment, expansion, and upgrade projects.
- Support engineering, operations, and partner teams during incident response and service restoration activities.
- Lead operational excellence initiatives, including runbook development, process improvement, and operational readiness reviews.
- Mentor engineers, support onboarding efforts, and contribute to technical interviews and hiring activities.
- Represent the team in cross-functional reviews, governance meetings, and vendor engagements.
Technical Qualifications
Networking
- Expert-level knowledge of routing and networking protocols including BGP, OSPF, IS-IS, TCP/IP, IPv4, IPv6, DNS, DHCP, and MPLS.
- Strong hands-on experience with at least three of the following technologies:
- Juniper
- Cisco
- Arista
- InfiniBand
- Firewalls
- Data center switching platforms
- Circuit management technologies
- Proven ability to analyze network telemetry, identify root causes, and resolve complex service-impacting issues.
- Experience operating large-scale ISP, hyperscale cloud, or enterprise network environments.
- Familiarity with merchant silicon networking platforms such as Broadcom and Mellanox.
- Industry certifications (CCNP, CCIE, JNCIP, JNCIE, or equivalent) are advantageous.
GPU, RDMA & High-Performance Networking
- Experience supporting GPU-based infrastructure and AI/ML networking environments.
- Knowledge of RDMA technologies and lossless network architectures.
- Hands-on experience with InfiniBand and High-Performance Computing (HPC) environments is highly desirable.
Automation & Software Development
- Develop and maintain automation solutions to improve operational efficiency and service reliability.
- Build scripts and tooling to automate routine operational tasks and troubleshooting workflows.
- Experience with one or more of the following:
- Python
- Ansible
- Puppet
- SQL
- Infrastructure automation frameworks
Project & Technical Leadership
- Lead technical initiatives that improve operational processes, tooling, documentation, and service reliability.
- Drive the development and continuous improvement of runbooks, procedures, and operational standards.
- Contribute to strategic planning and execution of short, medium, and long-term engineering objectives.
- Partner with senior engineering and operational leaders to deliver critical infrastructure programs.
Preferred Experience & Attributes
- Bachelor's degree in Computer Science, Engineering, or a related discipline, or equivalent practical experience.
- Extensive experience supporting large-scale network infrastructure in 24x7 production environments.
- Strong understanding of incident management, operational excellence, and service restoration practices.
- Excellent analytical, organizational, and problem-solving skills.
- Strong written and verbal communication skills with the ability to influence technical and non-technical stakeholders.
- Self-motivated, proactive, and comfortable operating in fast-paced, mission-critical environments.
- Willingness to participate in an on-call or rotational support model as required.
Leadership Expectations
- Act as a technical leader within the organization, driving operational improvements and engineering best practices.
- Collaborate closely with shift leads, engineering teams, and management to ensure successful service delivery.
- Identify opportunities to improve team effectiveness through process optimization, automation, and innovation.
- Drive compliance reviews, runbook audits, and operational governance activities.
- Support recruitment, interviewing, mentoring, and development of junior engineers.
- Foster strong partnerships across OCI service teams to ensure consistent and scalable operational processes.
Responsibilities
Responsibilities
Network Operations
• Execute network changes safely and efficiently using established procedures, tools, and operational best practices.
• Participate in a rotational support model, including on-call responsibilities and incident response activities.
• Monitor network health and proactively identify, troubleshoot, and resolve service-impacting issues.
• Lead and contribute to major incident investigations, driving timely resolution and minimizing customer impact.
• Perform root cause analysis (RCA) and implement corrective actions to prevent recurrence.
• Manage fault detection, escalation, and resolution across OCI network infrastructure, working closely with internal teams and external vendors.
• Collaborate with engineering, operations, and support teams to maintain network stability and service availability.
• Mentor, train, and support the development of junior engineers, helping to build technical capability across the team.
• Exercise sound technical judgment and independent decision-making in a fast-paced, mission-critical environment.
About Us