Adoptable Cookbooks List

Looking for a cookbook to adopt? You can now see a list of cookbooks available for adoption!
List of Adoptable Cookbooks

Supermarket Belongs to the Community

Supermarket belongs to the community. While Chef has the responsibility to keep it running and be stewards of its functionality, what it does and how it works is driven by the community. The chef/supermarket repository will continue to be where development of the Supermarket application takes place. Come be part of shaping the direction of Supermarket by opening issues and pull requests or by joining us on the Chef Mailing List.

Select Badges

Select Supported Platforms

Select Status

RSS

slurm (34) Versions 1.5.1

Installs/Configures slurm workload manager

Policyfile
Berkshelf
Knife
cookbook 'slurm', '~> 1.5.1', :supermarket
cookbook 'slurm', '~> 1.5.1'
knife supermarket install slurm
knife supermarket download slurm
README
Dependencies
Quality 50%

Build Status
Cookbook Version

slurm

Wrapper cookbook that can prepare a full slurm cluster, controller, compute and accounting nodes

Requirements

Requires the following cookbooks:

  • mariadb Cookbook Version
  • shifter Cookbook Version

Platforms

The following platforms are supported:

  • Ubuntu 18.04
  • Debian 9

Other Debian family distributions are assumed to work, as long as the slurm version from the package tree
is at least 17.02 due to hostname behaviour of slurmdbd.

Chef

  • Chef 14.0+

TODO

  • Support for RHEL family
  • Make cgroup.conf file dynamic
  • Add recipe to setup a dynamic resource allocation cluster
  • Install slurm from static stable sources, i.e 17.11-latest, 18.08-latest
  • Refactor and remove code that can be used as a resource instead of a recipe
  • Remove static types of nodes and partitions and support static generation, maybe by passing the Hash directly
  • Complete spec files

Usage

Check the .kitchen.yml file for the run_list, this can be applied with:

$ kitchen converge [debian|ubuntu|all]

The use case for this run_list is to setup a monolith which contains all of the slurm components.

Recipes

slurm::_disable_ipv6

  • Disable ipv6 on a Linux system.

slurm::_systemd_daemon_reload

  • Makes available forcing a daemon-reload on systemd, in order to refresh service unit files.

slurm::accounting

  • Installs and configures slurmdbd, slurms' accounting service.

slurm::cluster

  • TODO sets up a dynamic resource allocation cluster.

slurm::compute

  • Installs and configures slurmd, slurms' compute service.

slurm::database

  • Installs and configures a MariaDB service.

slurm::default

  • Sets up slurm user and group
  • Installs packages common to all slurms' services.

slurm::munge

  • Sets up munge user and group
  • Installs and configures munge authentication service.

slurm::plugin_shifter

  • Sets up shifter plugin for slurm.

slurm::server

  • Installs and configures slurmctld, slurms' controller service.

This is where the common configuration file shared between slurmctld and slurmd services is generated.
Take a close look at attributes below.

Attributes

The attributes are presented here in order of importance for assembling a whole infrastructure.

Common

# ========================= Data bag configuration =========================
default['slurm']['secret']['secrets_data_bag']                 # The name of the encrypted data bag that stores openstack secrets

default['slurm']['secret']['service_passwords_data_bag']       # The name of the encrypted data bag that stores service user passwords, with
                                                               # each key in the data bag corresponding to a named Slurm service, like
                                                               # "slurmdbd", "slurmctl", "slurmd" (this may not be needed for slurm).

default['slurm']['secret']['db_passwords_data_bag']            # The name of the encrypted data bag that stores database passwords, with
                                                               # each key in the data bag corresponding to a named Slurm database, like
                                                               # "slurmdbd", "slurmctl", "slurmd"

default['slurm']['secret']['user_passwords_data_bag']          # The name of the encrypted data bag that stores general user passwords, with
                                                               # each key in the data bag corresponding to a user (this may not be needed for slurm).

# ========================= Slurm specific configuration =========================
default['slurm']['common']['conf_dir']                         # slurm configuration directory, usually '/etc/slurm-llnl'

default['slurm']['custom_template_banner']                     # String that is prepended to each slurm configuration file

default['slurm']['user']                                       # username to configure slurm as, usually 'slurm'

default['slurm']['group']                                      # group to configure slurm as, usually 'slurm'

default['slurm']['uid']                                        # Slurm user ID, common to all nodes, our default is 999, just before user land id's 

default['slurm']['gid']                                        # Slurm group ID, common to all nodes, our default is 999, just before user land id's

default['proxy']['http']                                       # proxy address for use with apt, mariadb, and system environment

Munge

default['slurm']['munge']['key']                               # munge key location

default['slurm']['munge']['env_file']                          # munge environment file, to be used by systemd

default['slurm']['munge']['auth_socket']                       # munge communication socket location

default['slurm']['munge']['user']                              # username to configure munge as, usually 'munge'

default['slurm']['munge']['group']                             # group name to configure munge as, usually 'munge'

default['slurm']['munge']['uid']                               # MUNGE user ID, common to all nodes, our default is 998, just before Slurm's

default['slurm']['munge']['gid']                               # MUNGE user ID, common to all nodes, our default is 998, just before Slurm's

Monolith

default['slurm']['control_machine']                            # fqdn of the machine where slurmctld is running

default['slurm']['nfs_apps_server']                            # fqdn of the machine where the apps directory is made available through nfs

default['slurm']['nfs_homes_server']                           # fqdn of the machine where the home directory is made available through nfs

default['slurm']['apps_dir']                                   # path to the apps directory

default['slurm']['homes_dir']                                  # path to the home directory

default['slurm']['monolith_testing']                           # tells the cookbook if the setup should be that of a monolith or not, usually for testing, either true or false

Database

default['mysql']['bind_address']                               # CIDR to where the mariadb server should listen to connections, defaults to '0.0.0.0'

default['mysql']['port']                                       # port to where the mariadb server should listen to connections, defaults to '3306'

default['mysql']['version']                                    # MariaDB version lock, defaults to '10.1'

default['mysql']['character-set-server']                       # database character set, defaults to 'utf8'

default['mysql']['collation-server']                           # database collation, defaults to 'utf8_general_ci'   

default['mysql']['user']['slurm']                              # user which slurm accounting service uses to connect to the database

Accounting

default['slurm']['accounting']['conf_file']                    # path to the slurmdbd configuration file, defaults to '/etc/slurm-llnl/slurmdbd.conf'

default['slurm']['accounting']['env_file']                     # path to the slurmdbd environment file location, defaults to '/etc/default/slurmdbd'

default['slurm']['accounting']['bin_file']                     # path to the slurmdbd binary, defaults to '/usr/sbin/slurmdbd'

default['slurm']['accounting']['pid_file']                     # path to the slurmdbd pid file, defaults to '/var/run/slurm-llnl/slurmdbd.pid'

default['slurm']['accounting']['systemd_file']                 # path to the slurmdbd systemd service unit file, defaults to '/lib/systemd/system/slurmdbd.service'

default['slurm']['accounting']['debug']                        # debug level, valid values from 0-7, defaults to '3'

default['slurm']['accounting']['conf']                         # Hash representing the slurmdbd configuration options

The default for ['slurm']['accounting']['conf'] is:

{
  AuthType: 'auth/munge',
  AuthInfo: node['slurm']['munge']['auth_socket'],
  DbdHost: node['hostname'], 
  DebugLevel: node['slurm']['accounting']['debug'],
  LogFile: '/var/log/slurm-llnl/slurmdbd.log', # default is syslog
  MessageTimeout: '10',
  PidFile: node['slurm']['accounting']['pid_file'], 
  SlurmUser: node['mysql']['user']['slurm'],
  StorageHost: node['hostname'], 
  StorageLoc: 'slurm_acct_db',
  StoragePort: node['mysql']['port'],
  StorageType: 'accounting_storage/mysql',
  StorageUser: node['mysql']['user']['slurm'],
}

take into account that when overriding ['slurm']['accounting']['conf'] you will override all of its options.

Server

default['slurm']['cluster']['name']                            # Name for the cluster, defaults to 'slurm-test'

default['slurm']['server']['conf_file']                        # path to the slurmctld and slurmd configuration file, defaults to '/etc/slurm-llnl/slurm.conf'

default['slurm']['server']['env_file']                         # path to the slurmctld environment file, defaults to '/etc/default/slurmctld'

default['slurm']['server']['bin_file']                         # path to the slurmctld binary file, defaults to '/usr/sbin/slurmctld'

default['slurm']['server']['pid_file']                         # path to the slurmctld pid file, defaults to '/var/run/slurm-llnl/slurmctld.pid'

default['slurm']['server']['systemd_file']                     # path to the slurmctld systemd service unit file, defaults to '/lib/systemd/system/slurmctld.service'

default['slurm']['server']['service_req']                      # name of the storage service(s) that the slurm service should depend on to start 
                                                               # this should be either empty or the name of the storage service client(s) that slurm might depend on (ceph, beegfs, lustre)

default['slurm']['server']['cgroup_dir']                       # path to the cgroup plugin directory, defaults to '/etc/slurm-llnl/cgroup'

default['slurm']['server']['cgroup_conf_file']                 # path to the cgroup configuration file, defaults to '/etc/slurm-llnl/cgroup.conf'

default['slurm']['server']['plugstack_dir']                    # path to the slurm plugin directory, defaults to '/etc/slurm-llnl/plugstack.conf.d'

default['slurm']['server']['plugstack_conf_file']              # path to the slurm plugin configuration file, defaults to '/etc/slurm-llnl/plugstack.conf'

default['slurm']['shifter']                                    # Boolean, if true shifter will be installed

default['shifter']['imagegw']                                  # Boolean, if true the shifter image gateway will be installed and configured (assumes default['slurm']['shifter'] == true

default['shifter']['imagegw_fqdn']                             # String, Image Gateway FQDN, accessible hostname or ip address, defaults node['slurm']['control_machine']

default['shifter']['siteenv_append']                           # String, Environment Variable Append control, defaults to 'PATH=/opt/udiImage/bin'

Compute nodes

In the computes.rb attribute file you can see an example for the various slurm cluster settings.

For now we assume three types of partitions (and nodes):

  • small
  • medium
  • large

representing the capacity (memory) for each group. The nodes in each group are assumed to be homogeneous.

Each group properties can be passed via the following attributes

default['slurm']['conf']['nodes'][type]['count']                              
default['slurm']['conf']['nodes'][type]['properties']['cpus']               # amount of CPUs available in the node group, Integer
default['slurm']['conf']['nodes'][type]['properties']['mem']                # amount of RAM available in the node group, Megabytes
default['slurm']['conf']['nodes'][type]['properties']['sockets']            # number of sockets in node group, on private cloud systems it is usually the number of cpus
default['slurm']['conf']['nodes'][type]['properties']['cores_per_socket']   # number of cores per socket, on private cloud systems it is usually one
default['slurm']['conf']['nodes'][type]['properties']['threads_per_core']   # number of threas per core, on private cloud systems it is usually one
default['slurm']['conf']['nodes'][type]['properties']['weight']             # preference for being allocated work to, the lower the weight the highest the preference

At this time, this cookbook is designed to work either as a monolith (PoC) or to be deployed in a private cloud environment.

Data Bags

From the previous section we can see which data bags are required to exist. Each of the items must have a key with the same name as the data bag, where the secret value should be stored.
Within those databags we have to create the following items:

DataBag Item Keys
slurm_db_passwords mysqlroot ---
slurm_db_passwords node['mysql']['user']['slurm'] ---
slurm_secrets munge ---

Any of the slurm_db_passwords items should be text passwords, generated with your favorite tool.

The munge key should be a base64 key, based on binary data generated from running either of the following:

  • $ create-munge-key -r on a system with munge installed (note that it will try to overwrite any existing key in /etc/munge/munge.key)
  • $ dd if=/dev/random bs=1 count=1024 > munge.key
  • $ dd if=/dev/urandom bs=1 count=1024 > munge.key

For more information on generating a munge key see the munge documentation.

Authors

Dependent cookbooks

mariadb ~> 2.0
shifter ~> 1.0

Contingent cookbooks

There are no cookbooks that are contingent upon this one.

Collaborator Number Metric
            

1.5.1 passed this metric

Contributing File Metric
            

1.5.1 failed this metric

Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a CONTRIBUTING.md file

Foodcritic Metric
            

1.5.1 passed this metric

No Binaries Metric
            

1.5.1 passed this metric

Testing File Metric
            

1.5.1 failed this metric

Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file

Version Tag Metric
            

1.5.1 failed this metric

Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number