A job Scheduler handles the multiple requests from different concurrent end users of a computing system that are analyzing big data in form of a so-called computing job. These schedulers are also sometimes called resource management system, workload manager, or just scheduling software. There are a wide variety of job schedulers on the market depending on what computing system is actually used. In high performance computing (HPC) large Linux clusters and supercomputers use a job Scheduler but also systems driven by high throughput computing (HTC) use job and task schedulers. The basic underlying concepts however are often the same.
The installed scheduling policies might vary. Often used are scheduling policies such as first come – first served (FCFS) with backfilling in case smaller jobs fit into the system earlier. The key goal of a job scheduler is thus to Keep the computing system fully loaded and not idle in order to maximize its utilization. It is typically installed by the administrators of a computing System while end users rather use one installed system on a given computing system. There are free and commercial solutions available.
Large linux clusters and supercomputers use the Simple Linux Utility for Resource Management (SLURM). It is free and open-source Software for Linux and Unix Environments. IBM supercomputers such as the BlueGene series typically offer the commercial LoadLeveler (LL) scheduling software. Often used is also the Terascale open-source Resource and QUEue Manager (Torque) and open-source Maui scheduler combination. These are all rather used in HPC environments. Another rather HTC driven example is a system called Yet Another Resource Negotiator (YARN) that is a job scheduler from Apache Hadoop.
More details about a Job Scheduler
The following video shows some details of how such a system works: