Senior Applications Support Specialist

Ensono

Remote - United Kingdom

Apply on EasyApply

Create a free account to apply in seconds

Key Responsibilities

Incident & Problem Management

• Lead major incident (MI) bridges and restore service with minimum business impact.

• Handle all L3 escalations, perform deep diagnostics across Java, JVM, middleware, OS, and infra.

• Own technical RCAs, drive long‑term and systemic remediation.

• Identify recurring failure patterns and risks.

Reliability Engineering

• Apply SRE principles: SLIs/SLOs, error budgets, resilience patterns.

• Tune JVM parameters, analyze thread/heap dumps, and improve performance.

• Influence application architecture for fault tolerance, scalability, and recoverability.

• Validate DR readiness, failover behavior, and resilience testing outcomes.

Change, Release & Risk

• Provide technical approval and risk assessment for high-risk changes.

• Enforce operational readiness for new apps and major releases.

• Ensure changes meet audit, compliance, and regulatory expectations.

Automation, Monitoring & Observability

• Build advanced automation using Shell/Python/PowerShell.

• Develop frameworks for health validation, automated recovery, and compliance checks.

• Define observability standards; optimize alerts and improve MTTR.

Leadership & Mentorship

• Mentor L1/L2 teams; review and approve runbooks, SOPs, and KB articles.

• Act as a trusted technical advisor to stakeholders and leadership.

Skills & Qualifications

Technical (Mandatory)

• Strong knowledge of application architecture, distributed systems, and middleware.

• Java expertise: JVM internals, GC, memory management, thread/heap dump analysis, performance tuning.

• .Net -- CLR internals, garbage collection, memory management, thread/dump analysis, and application performance tuning.

• Strong Unix/Linux, networking basics, and advanced scripting (Shell/Python/PowerShell/VBS).

• Advanced SQL and understanding of databases; Autosys (or equivalent scheduler).

• Handson with observability tools: Splunk, AppDynamics/Dynatrace, ELK, Grafana, Prometheus.

Reliability & Operations

• Major incident leadership, deep RCA, change/release readiness, DR & resilience engineering.

• Experience in regulated production environments.

Soft Skills

• Strong technical leadership and decision‑making.

• Clear communication during high‑pressure incidents.

• Ownership mindset and business awareness.

Experience & Education

• 7–12+ years in Application Reliability, Production Support, SRE, or platform operations.

• Bachelor’s degree in Computer Science/Engineering or equivalent.

• ITIL, cloud, or industry certifications (preferred).

• Banking/financial domain experience (preferred).

Working Conditions

• On‑call and after‑hours support as required.

• Fast‑paced environment with multiple priorities.

• Hybrid working model

Skills

JavaJVM InternalsShell ScriptingPythonObservability ToolsTechnical LeadershipClear CommunicationOwnership MindsetIncident ManagementChange Management