Job Summary:
A proactive and technically skilled Production Support Lead to oversee the stability and performance of our live production systems. You will be responsible for leading a team of support engineers, managing incident response, and driving continuous improvement of support processes and system reliability. This role requires a balance of hands-on technical troubleshooting and people leadership.
Key Responsibilities:
- Lead the production support function across critical applications and services.
- Manage and mentor a team of support engineers, setting priorities and ensuring high performance.
- Oversee the incident management process — triage, root cause analysis, resolution, and communication.
- Work closely with development, QA, DevOps, and infrastructure teams to address issues and deploy fixes.
- Implement monitoring and alerting solutions to proactively identify and resolve issues.
- Define and enforce SLAs, uptime targets, and escalation procedures.
- Ensure comprehensive documentation of incidents, fixes, and knowledge-base articles.
- Drive the implementation of automation tools and practices to improve efficiency.
- Track key metrics (uptime, MTTR, incident volume) and present reports to leadership.
- Own the change management and release validation process to minimize production risks.
Required Skills & Qualifications:
- 5+ years of experience in production support/IT operations.
- 2+ years of experience in a lead or managerial capacity.
- Experience in Java, NodeJS, SOA, SpringCloud, SpringBoot.
- Experience supporting large-scale, mission-critical systems in a 24x7 environment.
- Strong troubleshooting skills across application, infrastructure, and network layers.
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- Familiarity with Linux/Unix environments, databases (e.g., PostgreSQL, MySQL), and cloud infrastructure (AWS, Azure, or GCP).
- Experience with incident tracking tools (e.g., Jira, ServiceNow), monitoring tools (e.g., New Relic, Datadog, Prometheus), and logging tools (e.g., ELK stack, Splunk).
- Strong communication and leadership skills, especially under pressure.
- Excellent organizational and prioritization abilities.
Preferred Qualifications:
- ITIL Certification or familiarity with ITIL best practices.
- Experience with CI/CD pipelines and DevOps tooling.
- Knowledge of scripting (e.g., Bash, Python) for automation.
- Background in regulated environments (e.g., FinTech, Healthcare).