Incident Manager with Datadog Job at Vish Consulting IT, Remote

S2tPNmdZK0hlalhkenZxbUd3MEMxdUZDTHc9PQ==
  • Vish Consulting IT
  • Remote

Job Description

Position: Monitoring and Operations Engineer / ITSM Incident Commander

Location: 100% Remote

Duration: 12+ Months Contract

Interview: Video

Role Overview:

Our IT Service Management (ITSM) team composed of highly skilled Incident, Problem, Change and Event Management professionals. We are seeking a dynamic and motivated individual to join our team. As a Monitoring and Operations Engineer, The Monitoring and Observability engineer will be responsible for Designing, configuring, monitoring, implementing, and maintaining our observability solutions and troubleshooting IT systems and applications to ensure optimal performance and reliability. You will work closely with cross-functional teams to identify potential issues and provide innovative insights to optimize system performance, stability, and availability. The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.

You will be responsible for managing the process to define the end-to-end lifecycle of incidents, events, problems, and changes within the organization to ensure effective resolution and prevention of future occurrences. The ideal candidate will have a strong background in IT service management, incident / event management, with the ability to lead and coordinate cross-functional teams to drive continuous improvement within the organization.

Key Areas of Responsibility:

  • This is a strategic and hands-on position where you will work closely with cross-functional teams to identify potential issues and provide innovative insights to optimize system performance, stability, and availability.
  • Guide cross functional teams to manage and support their PagerDuty alerts, teams, schedules, escalation policies and automations.
  • The engineer will also be responsible for automating alerting and remediation processes to reduce mean time to resolution (MTTR) and improve system uptime.
  • Monitor Server, network infrastructure and application performance metrics, and identify patterns and trends to improve system performance and reliability.
  • Troubleshoot issues and outages, working closely with development and operations teams to identify root causes and develop solutions.
  • Collaborate with cross-functional teams to support incident management, change management, and problem management processes.
  • Proactively detect and prevent future problems/incidents and initiate the Problem Management process to allow quicker diagnosis and resolution.
  • Develop trend analysis and prepare service improvement plans to address identified gaps.
  • Build strong relationships with key stakeholders, including senior management, department heads, and external partners, to ensure their support and engagement in incident management initiatives.
  • Foster a culture of continuous improvement, staying abreast of industry trends, emerging technologies, and best practices to enhance incident management capabilities.
  • Create dashboards and reports to provide insights into operational performance and health.
  • Build automation to optimize processes and workflows within our on-call systems and monitoring platforms.
  • Complete any assigned project work or tasks, with a view to improving existing processes, capabilities and seek out automation opportunities.
  • Ability to support on-call rotation and off-hours support as required.

Qualifications:

Minimum Qualifications:

  • Bachelor's Degree in IT, Business Management or a related discipline preferred.
  • 5+ of direct experience working in the observability, operations, or DevOps domains.
  • Proficient in Observability, monitoring, PagerDuty, and logging tools Like Datadog, Dynatrace, PowerBi, etc.
  • 3+ years of technical experience: systems engineering, SRE, DevOps, software engineering

Other Required Qualifications:

  • Excellent written and verbal communication skills with the ability to communicate effectively with all stakeholders including senior leadership.
  • Strong ability to understand, accurately translate and produce technical information for a general and business audience.
  • Strong experience with change, incident, and problem management principles, methodologies, and tools.
  • Experience using configuration and change tools to include such as ServiceNow Change and CMDB and or related tools.
  • Experience with project delivery methodologies (Agile, Scrum).
  • Hands on experience with monitoring and performance monitoring tools: DataDog, Dynatrace, Splunk, etc.

Preferred Qualifications:

  • ITIL v3 Foundation Certification Preferred
  • Certification in Project Management
  • Experience implementing continuous process improvements within a configuration, change, release, or asset management program
  • Cloud certifications (Azure, AWS, GCP)
  • Direct experience scripting in two of the following languages: Python, PowerShell, Bash.
  • Proficient at technical and business writing

Job Tags

Full time, Contract work, Remote job,

Similar Jobs

Parts Town

Creative Project Manager Job at Parts Town

 ...you the master of organizing everything, from Tetris blocks to epic weekend plans with friends? Can you keep a group of ducks (or...  ...nodding your head, we need someone like YOU! As our Creative Project Manager, your missionshould you choose to accept itis to transform creative... 

andon

日本业务负责人Head of Japan Business Job at andon

 ...61Bachelor degree or above, 5 years or above working experience, can accept Japan resident working life;2Proficient in Japanese, good English, can communicate with foreign customers fluently and efficiently;3Knowledge... 

Analytica

UX Researcher Job at Analytica

 ...Analytica is seeking a UX Researcher to join our dynamic team supporting a large digitalization effort; supporting search efforts and user-centered journey mapping, developing workflow update recommendations.The ideal candidate will work closely with cross-functional... 

ABB

Business Development Manager, Services DFW Job at ABB

Business Development Manager, Services DFW At ABB, we are dedicated to addressing global challenges. Our core values: care, courage, curiosity, and collaboration - combined with a focus on diversity, inclusion, and equal opportunities - are key drivers in our aim to...

Royal Caribbean Group

Cruise Staff Job at Royal Caribbean Group

Cruise Staff hosts and participates in entertainment, recreational, and social programs for adults and families in the vessel. You will organize and facilitate various activities and tournaments, coordinate audio visual equipment for events, and operate spotlight equipment...