Director - Tech Operations

American Express

Phoenix, AZ, United States Full time Technology Operations
Apply on EasyApply

Create a free account to apply in seconds

Joining Amex Tech means discovering and shaping your contribution to something big. Here, you can work alongside talented tech teams and build a unique career with the Powerful Backing of American Express. With a range of opportunities to work with the latest technologies, and a commitment to back the broader engineering community through open source, our mission is to power your success. Because Amex Tech is powered by our technology, our culture, and our colleagues.

The Technology organization enables and accelerates the company’s growth strategies, delivering global capabilities and services in support of Amex’s customers and colleagues, while maintaining 24/7 servicing and availability to ensure an uninterrupted, high-quality customer experience. Technology provides the foundation for everything we do in the company while driving differentiation through building and leveraging innovative technology and data insights.

As Application Support Service Delivery director, you will be responsible for creating best-in-class runtime operations, maturing our resiliency practices, and ensuring high availability and rapid resolution for the products and services that power critical customer channels at American Express. The reliability of the products and services spanning customer channels is key to delivering quality of experience that American Express customers expect.

In this role, you will provide technical direction to ensure teams possess a deep knowledge of application flows, business logic, and system interdependencies. You will drive continuous improvement in the overall support process through automation and resiliency tools, technical troubleshooting, automated remediation, and reliable disaster recovery. You will be required to work closely with cross-functional teams across the enterprise, including Site Reliability Engineering, Mission Control, Application Development, Operations, and Product teams.

Responsibilities

• Oversee and directly participate in the response to major incidents on a 24x7x365 basis.

• Drive the adoption of tooling, instrumentation, automation, and application resiliency solutions across the Contact Center Technology ecosystem

• Mature and implement enterprise-wide resiliency practices to ensure observability, reliability, and high availability across all Call Center customer journeys

• Prioritize technical excellence and continuously increase the engineering output and capabilities of the organization

• Lead application support across a complex set of CCP-facing channels and journeys, ensuring a deep understanding of application and system interdependencies

• Drive continuous operating efficiency by collaborating closely with SRE and Application Development teams across the organization

• Continuously evaluate and improve the application support process, implementing best practices and driving change across the organization

• Partner with product and engineering teams to weigh in on system architecture with a focus on availability, scalability, resiliency, and customer experience

• Contribute to and implement standards for how we build, deploy, monitor, and maintain our critical systems and infrastructure

• Develop and implement best practices for site reliability, including incident management, change management, root cause analysis, and monitoring/analyzing system performance

• Collaborate with product development partners to establish non-functional requirements for new products and services

• Participate in a detailed design reviews and set standards for the organization

• Develop and maintain Service-Level Objectives and Service-Level Indicators

• Partner with peers and technology leaders across the company to establish a culture of continuous improvement

Qualifications

• Bachelor’s degree or relevant professional experience in computer science or related science, technology, engineering, or mathematics fields

• Site Reliability Engineering / Application support / Engineering background with a strong focus on the Call Center Technology applications and business

• Experience in driving high availability, resiliency, stability, and performance of voice and routing platforms supporting customer servicing operations

• Lead capacity planning, disaster recovery, failover readiness, and business continuity for voice operations

• Experience in leading Major Incident Management (MIM) processes for high-severity production incidents impacting call routing, IVR, agent desktop, voice recording, and telephony integrations.

• Partnership with infrastructure and Voice platform teams to improve network, SIP, and voice connectivity resiliency.

• Experience in driving modernization and transformation initiatives across contact center technology platforms

• High comfort driving technology emergency response and recovery

• Network fundamentals and deep knowledge of private and public cloud

• Hands-on experience with system troubleshooting and issue triaging

• Demonstrated technical leadership and decision-making skills

• Strong communication and relationship management skills at all levels

• Experience managing large teams and fostering a culture of inclusion

• SRE best practices adoption including error budgets, service health indicators, operational automation, and reliability engineering principles

• Deep knowledge of Genesys on-premise and cloud-hosted voice platforms such as Genesys Cloud, Five9, or Amazon Connect preferred

• Depending on factors such as business unit requirements, the nature of the position, cost and applicable laws, American Express may provide visa sponsorship for certain positions.

Skills

Application SupportIncident ManagementAutomationResiliency PracticesTechnical TroubleshootingCollaborationContinuous ImprovementSystem ArchitectureMonitoring and Analyzing System PerformanceCommunication