
SITE RELIABILITY ENGINEER
- Coral Gables, FL
- $99,000-130,000 per year
- Permanent
- Full-time
Responsibilities
- Proactively identify and resolve incidents before they impact operations.
- Monitor all systems and infrastructure for the highest level of availability.
- Perform routine maintenance tasks, including monitoring, patching, and backups.
- Respond to incidents and outages in a timely and effective manner.
- Collaborate with other teams to diagnose and resolve complex issues.
- Document incident details and implement corrective actions to prevent recurrence.
- Document processes, configurations, and troubleshooting procedures.
- Diagnose and resolve application performance problems or system outages.
- Play the role of Incident Manager during outages.
- Resolve complex hardware and software issues, and work with vendors when necessary.
- Optimize system performance and resource utilization on-prem and in the cloud.
- Develop and maintain automation scripts to streamline repetitive tasks.
- Utilize scripting languages (e.g., PowerShell, Python, etc.) to automate system administration.
- Implement configuration management tools to ensure consistency and repeatability.
- Create and maintain comprehensive documentation of IT processes and procedures.
- Other duties as assigned by leadership.
- Strong understanding of IT infrastructure components, including servers, networks, and storage.
- Knowledge in scripting languages (e.g., PowerShell, Python).
- Knowledge of networking concepts and protocols (e.g., TCP/IP, DNS, DHCP).
- Experience with IT service management frameworks.
- Experience with cloud platforms such as AWS and Azure.
- Experience of virtualization technologies such as Azure VDI, AWS Workspaces.
- Experience with monitoring and alerting tools (e.g., New Relic, Datadog).
- Excellent problem-solving and analytical skills.
- Strong communication and interpersonal skills.
- Extensive expertise in the Windows operating system.