Linux Automation Engineer
Job Summary – Linux Automation Engineer
Experienced Linux Automation Engineer responsible for designing, developing, and managing enterprise-scale Linux infrastructure automation solutions, HPC cluster management platforms, and monitoring systems. The role focuses on automating server provisioning, operating system deployment, configuration management, patching, infrastructure lifecycle management, and cluster orchestration to improve operational efficiency and scalability. Based on the job description, the position also involves leading technical initiatives, collaborating with cross-functional teams, and driving best practices in software development, DevOps, and infrastructure automation.
The position requires strong expertise in Linux administration, Python development, infrastructure automation tools, and observability platforms. The engineer is expected to build and enhance scalable backend systems, REST APIs, centralized monitoring solutions, and management dashboards while ensuring high availability, security, and maintainability of infrastructure environments. Responsibilities also include implementing CI/CD pipelines, integrating enterprise platforms, and supporting HPC environments through cluster provisioning, resource monitoring, scheduler integration, and performance analytics.
Key Responsibilities:
1. Technical Leadership:
· Lead and mentor a team of Full Stack Developers, Backend Developers, and Automation Engineers.
· Define technical architecture, coding standards, and development best practices.
· Conduct code reviews, design reviews, and technical feasibility assessments.
· Drive product roadmap discussions and technical decision-making.
· Collaborate with Product, Infrastructure, Validation, and Support teams.
2. Platform & Product Development:
Architect and develop infrastructure management platforms, including:
· HPC Cluster Management Tools
· Centralized Monitoring & Alerting Platforms
· Infrastructure Lifecycle Management Systems
· Asset & Inventory Management Solutions
· Capacity Planning & Analytics Platforms
· Infrastructure Automation & Orchestration Tools
3. Linux & Infrastructure Automation:
Design automation frameworks for:
· Server Provisioning
· OS Deployment & Configuration
· Firmware & Driver Compliance
· Cluster Deployment
· Patch Management
· Infrastructure Health Checks
· Automated Validation & Benchmarking
· Develop integrations with Linux-based infrastructure and enterprise platforms.
4. Monitoring & Observability:
Design centralized monitoring solutions for:
· Servers
· Storage
· Networking
· GPU Clusters
· HPC Infrastructure
· Data Center Environments
· Integrate monitoring tools such as Prometheus, Grafana, Open Telemetry, ELK, Redfish, SNMP, and IPMI.
· Build predictive monitoring, alerting, and analytics capabilities.
5. HPC & Cluster Management:
Develop and enhance cluster management capabilities including:
· Node Discovery & Registration
· Cluster Provisioning
· Resource Monitoring
· Job Monitoring
· Scheduler Integration (Slurm /Open HPC)
· Health & Performance Analytics
· Cluster Lifecycle Management
6. Software Engineering:
· Design scalable backend architectures and REST APIs.
· Guide development of web-based dashboards and management portals.
· Implement CI/CD pipelines and DevOps best practices.
· Ensure high availability, scalability, security, and maintainability of developed solutions.
Required Technical Skills
Linux Expertise
· RHEL, Rocky Linux, Ubuntu, Alma Linux
· Linux System Internals
· Performance Tuning & Troubleshooting
· Shell Scripting
Programming
· Python (Expert)
· Go Lang (Preferred)
· JavaScript/TypeScript
· REST API Development
Automation & Infrastructure
· Ansible
· Terraform
· Git
· Jenkins/GitLab CI/CD
Monitoring & Observability
· Prometheus
· Grafana
· ELK Stack
· Open Telemetry
· Zabbix/Nagios
· Redfish/IPMI
Databases
· PostgreSQL
· MySQL
· MongoDB
Front-End Awareness
· ReactJS / Angular (working knowledge to guide development teams)
· Dashboard & Visualization Frameworks
HPC Technologies (Preferred)
· Slurm
· Open HPC
· GPU Clusters
Recommended Jobs
Posted just now
Posted just now
Posted just now
Posted just now
Posted just now

