All roles

Load sharing Facility- IBM

Remote · USA Full-time New today

Position: Load sharing facility IBM Location: San Jose, CA- Remote Contract: W2 Key Job Responsibilities

  • Cluster Management: Install, configure, and maintain IBM Spectrum LSF clusters to optimize resource utilization.
  • Workload Optimization: Manage job queues, policy-driven scheduling, and workload balancing across server hosts.
  • Troubleshooting: Monitor system performance (LIM, MBD, SBD daemons) and resolve issues related to job submission, execution, and host availability.
  • Automation & Scripting: Develop tools (Python, shell scripts) to streamline cluster management and improve efficiency.
  • License Management: Optimize software license configuration to ensure efficient EDA tool utilization.
  • Collaboration: Work with engineering, DevOps, and data science teams to align HPC infrastructure with business needs.

Required Skills and Qualifications

  • Experience: Generally 4–12+ years in IT architecture, system engineering, or HPC environments.
  • Technical Knowledge: Deep understanding of IBM Spectrum LSF, job scheduling, and workload management.
  • OS Proficiency: Strong Linux/Unix systems administration skills.
  • Automation Tools: Experience with scripting (Python, shell) and automation tools like Ansible or Terraform.
  • Education: Bachelor’s or Master’s degree in Computer Science or Engineering

Apply tot his job Apply To this Job

Related roles