cookbook 'slurm', '~> 1.5.1'
slurm
(34) Versions
1.5.1
-
Follow0
Installs/Configures slurm workload manager
cookbook 'slurm', '~> 1.5.1', :supermarket
knife supermarket install slurm
knife supermarket download slurm
slurm
Wrapper cookbook that can prepare a full slurm cluster, controller, compute and accounting nodes
Requirements
Requires the following cookbooks:
Platforms
The following platforms are supported:
- Ubuntu 18.04
- Debian 9
Other Debian family distributions are assumed to work, as long as the slurm version from the package tree
is at least 17.02 due to hostname behaviour of slurmdbd.
Chef
- Chef 14.0+
TODO
- Support for RHEL family
- Make cgroup.conf file dynamic
- Add recipe to setup a dynamic resource allocation cluster
- Install slurm from static stable sources, i.e 17.11-latest, 18.08-latest
- Refactor and remove code that can be used as a resource instead of a recipe
- Remove static types of nodes and partitions and support static generation, maybe by passing the Hash directly
- Complete spec files
Usage
Check the .kitchen.yml file for the run_list, this can be applied with:
$ kitchen converge [debian|ubuntu|all]
The use case for this run_list is to setup a monolith which contains all of the slurm components.
Recipes
slurm::_disable_ipv6
- Disable ipv6 on a Linux system.
slurm::_systemd_daemon_reload
- Makes available forcing a
daemon-reload
on systemd, in order to refresh service unit files.
slurm::accounting
- Installs and configures slurmdbd, slurms' accounting service.
slurm::cluster
- TODO sets up a dynamic resource allocation cluster.
slurm::compute
- Installs and configures slurmd, slurms' compute service.
slurm::database
- Installs and configures a MariaDB service.
slurm::default
- Sets up slurm user and group
- Installs packages common to all slurms' services.
slurm::munge
- Sets up munge user and group
- Installs and configures munge authentication service.
slurm::plugin_shifter
- Sets up shifter plugin for slurm.
slurm::server
- Installs and configures slurmctld, slurms' controller service.
This is where the common configuration file shared between slurmctld
and slurmd
services is generated.
Take a close look at attributes below.
Attributes
The attributes are presented here in order of importance for assembling a whole infrastructure.
Common
# ========================= Data bag configuration =========================
default['slurm']['secret']['secrets_data_bag'] # The name of the encrypted data bag that stores openstack secrets
default['slurm']['secret']['service_passwords_data_bag'] # The name of the encrypted data bag that stores service user passwords, with
# each key in the data bag corresponding to a named Slurm service, like
# "slurmdbd", "slurmctl", "slurmd" (this may not be needed for slurm).
default['slurm']['secret']['db_passwords_data_bag'] # The name of the encrypted data bag that stores database passwords, with
# each key in the data bag corresponding to a named Slurm database, like
# "slurmdbd", "slurmctl", "slurmd"
default['slurm']['secret']['user_passwords_data_bag'] # The name of the encrypted data bag that stores general user passwords, with
# each key in the data bag corresponding to a user (this may not be needed for slurm).
# ========================= Slurm specific configuration =========================
default['slurm']['common']['conf_dir'] # slurm configuration directory, usually '/etc/slurm-llnl'
default['slurm']['custom_template_banner'] # String that is prepended to each slurm configuration file
default['slurm']['user'] # username to configure slurm as, usually 'slurm'
default['slurm']['group'] # group to configure slurm as, usually 'slurm'
default['slurm']['uid'] # Slurm user ID, common to all nodes, our default is 999, just before user land id's
default['slurm']['gid'] # Slurm group ID, common to all nodes, our default is 999, just before user land id's
default['proxy']['http'] # proxy address for use with apt, mariadb, and system environment
Munge
default['slurm']['munge']['key'] # munge key location
default['slurm']['munge']['env_file'] # munge environment file, to be used by systemd
default['slurm']['munge']['auth_socket'] # munge communication socket location
default['slurm']['munge']['user'] # username to configure munge as, usually 'munge'
default['slurm']['munge']['group'] # group name to configure munge as, usually 'munge'
default['slurm']['munge']['uid'] # MUNGE user ID, common to all nodes, our default is 998, just before Slurm's
default['slurm']['munge']['gid'] # MUNGE user ID, common to all nodes, our default is 998, just before Slurm's
Monolith
default['slurm']['control_machine'] # fqdn of the machine where slurmctld is running
default['slurm']['nfs_apps_server'] # fqdn of the machine where the apps directory is made available through nfs
default['slurm']['nfs_homes_server'] # fqdn of the machine where the home directory is made available through nfs
default['slurm']['apps_dir'] # path to the apps directory
default['slurm']['homes_dir'] # path to the home directory
default['slurm']['monolith_testing'] # tells the cookbook if the setup should be that of a monolith or not, usually for testing, either true or false
Database
default['mysql']['bind_address'] # CIDR to where the mariadb server should listen to connections, defaults to '0.0.0.0'
default['mysql']['port'] # port to where the mariadb server should listen to connections, defaults to '3306'
default['mysql']['version'] # MariaDB version lock, defaults to '10.1'
default['mysql']['character-set-server'] # database character set, defaults to 'utf8'
default['mysql']['collation-server'] # database collation, defaults to 'utf8_general_ci'
default['mysql']['user']['slurm'] # user which slurm accounting service uses to connect to the database
Accounting
default['slurm']['accounting']['conf_file'] # path to the slurmdbd configuration file, defaults to '/etc/slurm-llnl/slurmdbd.conf'
default['slurm']['accounting']['env_file'] # path to the slurmdbd environment file location, defaults to '/etc/default/slurmdbd'
default['slurm']['accounting']['bin_file'] # path to the slurmdbd binary, defaults to '/usr/sbin/slurmdbd'
default['slurm']['accounting']['pid_file'] # path to the slurmdbd pid file, defaults to '/var/run/slurm-llnl/slurmdbd.pid'
default['slurm']['accounting']['systemd_file'] # path to the slurmdbd systemd service unit file, defaults to '/lib/systemd/system/slurmdbd.service'
default['slurm']['accounting']['debug'] # debug level, valid values from 0-7, defaults to '3'
default['slurm']['accounting']['conf'] # Hash representing the slurmdbd configuration options
The default for ['slurm']['accounting']['conf']
is:
{
AuthType: 'auth/munge',
AuthInfo: node['slurm']['munge']['auth_socket'],
DbdHost: node['hostname'],
DebugLevel: node['slurm']['accounting']['debug'],
LogFile: '/var/log/slurm-llnl/slurmdbd.log', # default is syslog
MessageTimeout: '10',
PidFile: node['slurm']['accounting']['pid_file'],
SlurmUser: node['mysql']['user']['slurm'],
StorageHost: node['hostname'],
StorageLoc: 'slurm_acct_db',
StoragePort: node['mysql']['port'],
StorageType: 'accounting_storage/mysql',
StorageUser: node['mysql']['user']['slurm'],
}
take into account that when overriding ['slurm']['accounting']['conf']
you will override all of its options.
Server
default['slurm']['cluster']['name'] # Name for the cluster, defaults to 'slurm-test'
default['slurm']['server']['conf_file'] # path to the slurmctld and slurmd configuration file, defaults to '/etc/slurm-llnl/slurm.conf'
default['slurm']['server']['env_file'] # path to the slurmctld environment file, defaults to '/etc/default/slurmctld'
default['slurm']['server']['bin_file'] # path to the slurmctld binary file, defaults to '/usr/sbin/slurmctld'
default['slurm']['server']['pid_file'] # path to the slurmctld pid file, defaults to '/var/run/slurm-llnl/slurmctld.pid'
default['slurm']['server']['systemd_file'] # path to the slurmctld systemd service unit file, defaults to '/lib/systemd/system/slurmctld.service'
default['slurm']['server']['service_req'] # name of the storage service(s) that the slurm service should depend on to start
# this should be either empty or the name of the storage service client(s) that slurm might depend on (ceph, beegfs, lustre)
default['slurm']['server']['cgroup_dir'] # path to the cgroup plugin directory, defaults to '/etc/slurm-llnl/cgroup'
default['slurm']['server']['cgroup_conf_file'] # path to the cgroup configuration file, defaults to '/etc/slurm-llnl/cgroup.conf'
default['slurm']['server']['plugstack_dir'] # path to the slurm plugin directory, defaults to '/etc/slurm-llnl/plugstack.conf.d'
default['slurm']['server']['plugstack_conf_file'] # path to the slurm plugin configuration file, defaults to '/etc/slurm-llnl/plugstack.conf'
default['slurm']['shifter'] # Boolean, if true shifter will be installed
default['shifter']['imagegw'] # Boolean, if true the shifter image gateway will be installed and configured (assumes default['slurm']['shifter'] == true
default['shifter']['imagegw_fqdn'] # String, Image Gateway FQDN, accessible hostname or ip address, defaults node['slurm']['control_machine']
default['shifter']['siteenv_append'] # String, Environment Variable Append control, defaults to 'PATH=/opt/udiImage/bin'
Compute nodes
In the computes.rb attribute file you can see an example for the various slurm cluster settings.
For now we assume three types of partitions (and nodes):
- small
- medium
- large
representing the capacity (memory) for each group. The nodes in each group are assumed to be homogeneous.
Each group properties can be passed via the following attributes
default['slurm']['conf']['nodes'][type]['count']
default['slurm']['conf']['nodes'][type]['properties']['cpus'] # amount of CPUs available in the node group, Integer
default['slurm']['conf']['nodes'][type]['properties']['mem'] # amount of RAM available in the node group, Megabytes
default['slurm']['conf']['nodes'][type]['properties']['sockets'] # number of sockets in node group, on private cloud systems it is usually the number of cpus
default['slurm']['conf']['nodes'][type]['properties']['cores_per_socket'] # number of cores per socket, on private cloud systems it is usually one
default['slurm']['conf']['nodes'][type]['properties']['threads_per_core'] # number of threas per core, on private cloud systems it is usually one
default['slurm']['conf']['nodes'][type]['properties']['weight'] # preference for being allocated work to, the lower the weight the highest the preference
At this time, this cookbook is designed to work either as a monolith (PoC) or to be deployed in a private cloud environment.
Data Bags
From the previous section we can see which data bags are required to exist. Each of the items must have a key with the same name as the data bag, where the secret value should be stored.
Within those databags we have to create the following items:
DataBag | Item | Keys |
---|---|---|
slurm_db_passwords | mysqlroot | --- |
slurm_db_passwords | node['mysql']['user']['slurm'] | --- |
slurm_secrets | munge | --- |
Any of the slurm_db_passwords
items should be text passwords, generated with your favorite tool.
The munge key should be a base64 key, based on binary data generated from running either of the following:
-
$ create-munge-key -r
on a system with munge installed (note that it will try to overwrite any existing key in /etc/munge/munge.key) $ dd if=/dev/random bs=1 count=1024 > munge.key
$ dd if=/dev/urandom bs=1 count=1024 > munge.key
For more information on generating a munge key see the munge documentation.
Authors
- Manuel Torrinha manuel.torrinha@tecnico.ulisboa.pt
Dependent cookbooks
mariadb ~> 2.0 |
shifter ~> 1.0 |
Contingent cookbooks
There are no cookbooks that are contingent upon this one.
Collaborator Number Metric
1.5.1 passed this metric
Contributing File Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a CONTRIBUTING.md file
Foodcritic Metric
1.5.1 passed this metric
No Binaries Metric
1.5.1 passed this metric
Testing File Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
1.5.1 passed this metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a CONTRIBUTING.md file
Foodcritic Metric
1.5.1 passed this metric
No Binaries Metric
1.5.1 passed this metric
Testing File Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
1.5.1 passed this metric
1.5.1 passed this metric
Testing File Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file
Version Tag Metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number
1.5.1 failed this metric
1.5.1 failed this metric
Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number