cookbook 'dse', '= 3.0.21'
dse (11) Versions 3.0.21 Follow7
Installs/Configures Datastax Enterprise.
cookbook 'dse', '= 3.0.21', :supermarket
knife supermarket install dse
knife supermarket download dse
Datastax Enterprise Chef Cookbook (Apache Cassandra)
This cookbook installs and configures Datastax Enterprise. More info is here (DataStax Enterprise).
It uses officially released Datastax packages. It can tweak the Cassandra config files, but has no way of adding data or creating keyspaces in Cassandra (yet).
Usage
This cookbook is designed to be used in conjuction with a wrapper cookbook. Used alone, a single node cluster can be created, but in order to create a multiple node cluster a wrapper is recommended.
Example in a wrapper:
node.default['java']['jdk_version'] = "7" node.default['cassandra']['seeds'] = "192.168.1.1, 192.168.1.2" node.default['cassandra']['dse_version'] = "4.0.3-1" node.default['cassandra']['max_heap_size'] = "12G" node.default['cassandra']['heap_newsize'] = "1200M" include_recipe "dse::cassandra"
Scope
This cookbook attempts to manage almost all Apache Cassandra configuration settings. It can also create Hadoop and Solr nodes, with less attribute to manage their config.
Apache Cassandra
This cookbook currently provides
- Datastax 4.x.x (Datastax Enterprise Edition) via packages.
Requirements
- Chef 11 or higher
Supported OS Distributions
Tested on:
- RHEL 6.3, 6.4
- Ubuntu 14.04.1 LTS
- Slight testing done on Ubuntu 12.04 (will require some edits)
Recipes
The provided recipes are dse::cassandra
, dse::solr
, and dse::hadoop
* dse::cassandra
will provision DSE as a cassandra node.
* dse::solr
will provision DSE with solr enabled.
* dse::hadoop
will provision DSE with hadoop enabled.
There are also recipes that should not be called directly that are used for configuration.
* dse::default
sets up the templates
* dse::datastax
sets up the datastax repos
* dse::datstax-agent
configures the datastax-agent if needed
* dse::ssl
(work in progress) sets up SSL keys on all nodes
Attributes
This cookbook will install DSE Cassandra by default. Other attributes you can set are:
default.rb
overall settings
-
node["cassandra"]["cluster_name"]
(default:Test Cluster
): The name of the cluster to provision -
node["cassandra"]["vnodes"]
(default:true
): enable or disable vnodes -
node["cassandra"]["intial_token"]
(default:nil
): the initial token to use. leave blank for vnodes -
node["cassandra"]["num_tokens"]
(default:256
): set the number of tokens to use -
node["cassandra"]["solr"]
(default:false
): enable solr or not node["cassandra"]["hadoop"]
(default:false
): enable hadoop or notnode["cassandra"]["dse_version"]
(default:4.0.3-1
): dse version to installnode["cassandra"]["user"]
(default:cassandra
): the cassandra usernode["cassandra"]["group"]
(default:cassandra
): the cassandra group
cassandra.yaml settings
-
node["cassandra"]["listen_address"]
(default:node['ipaddress']
): the ipaddress to use for listen address -
node["cassandra"]["rpc_address"]
(default:node['ipaddress']
): the ipaddress to use for rpc address -
node["cassandra"]["broadcast_address"]
(default:nil
): the ipaddress to use for broadcast address -
node["cassandra"]["seeds"]
(default:node['ipaddress']
): the ipaddress to use for the seed list -
node["cassandra"]["concurrent_reads"]
(default:32
): concurrent reads setting -
node["cassandra"]["concurrent_writes"]
(default:32
): concurrent writes setting -
node["cassandra"]["compaction_thruput"]
(default:16
): limit the throughput of compactions -
node["cassandra"]["multithreaded_compaction"]
(default:false
): enable or disable multithreaded compaction -
node["cassandra"]["in_memory_compaction_limit"]
(default:64
): size limit for in-memory compactions -
node["cassandra"]["trickle_fsync"]
(default:false
): enable trickle fsync, usually for ssd -
node["cassandra"]["range_request_timeout_in_ms"]
(default:10000
): default timeout on range requests -
node["cassandra"]["thrift_framed_transport_size_in_mb"]
(default:15
): the max size of a thrift frame -
node["cassandra"]["thrift_max_message_length_in_mb"]
(default:nil
): the max message length of a thrift call -
node["cassandra"]["concurrent_compactors"]
(default:nil
): the number of concurrent compactors to allow
Role based seed selection
-
node["cassandra"]["role_based_seeds"]
(default:false
): set to true to assign seeds based on members of dse-seed role -
node['cassandra']['seed_role']
(default:role:dse-seed
): set to a diffrent role to select seeds
gc settings
-
node["cassandra"]["CMSInitiatingOccupancyFraction"]
(default:65
): cms occupancy fraction to use for gc -
node["cassandra"]["max_heap_size"]
(default:8192M
): default max heap size for cassandra -
node["cassandra"]["heap_newsize"]
(default:800M
): default new gen size for heap
authentication settings
-
node["cassandra"]["authentication"]
(default:false
): enable or disable authentication -
node["cassandra"]["authorization"]
(default:false
): enable or disable authorization -
node["cassandra"]["authenticator"]
(default: ``): the authenticator to use (eg org.apache.cassandra.auth.AllowAllAuthenticator) -
node["cassandra"]["authorizor"]
(default: ``): the authorizor to use (eg org.apache.cassandra.auth.AllowAllAuthorizer)
audit logs
-
node["cassandra"]["log_level"]
(default:INFO
): the log level for cassandra (or solr/hadoop) -
node["cassandra"]["audit_logging"]
(default:false
): turn on audit logging -
node["cassandra"]["audit_dir"]
(default:/var/log/cassandra
): the directory to put audit logs in -
node["cassandra"]["active_categories"]
(default:ADMIN,AUTH,DDL,DCL
): the categories to audit on
metrics settings
-
node['cassandra']['metrics_reporter']['enabled']
(default:false
): enable or disable the metrics reporter jar -
node['cassandra']['metrics_reporter']['name']
(default:metrics-graphite
): the name of the jar to use, graphite is a popular one -
node['cassandra']['metrics_reporter']['jar_url']
(default:http://search.maven.org/remotecontent?filepath=com/yammer/metrics/metrics-graphite/2.2.0/metrics-graphite-2.2.0.jar
): where the jar is -
node['cassandra']['metrics_reporter']['sha256sum']
(default:6b4042aabf532229f8678b8dcd34e2215d94a683270898c162175b1b13d87de4
): checksum of the jar -
node['cassandra']['metrics_reporter']['jar_name']
(default:metrics-graphite-2.2.0.jar
): full name of the jar -
node['cassandra']['metrics_reporter']['config']
(default:{}
): hash of the conf to use, example below:
node.default['cassandra']['metrics_reporter'] = {
'enabled' => true,
'name' => 'metrics-graphite',
'jar_url' => 'http://search.maven.org/remotecontent?filepath=com/yammer/metrics/metrics-graphite/2.2.0/metrics-graphite-2.2.0.jar',
'sha256sum' => '6b4042aabf532229f8678b8dcd34e2215d94a683270898c162175b1b13d87de4',
'jar_name' => 'metrics-graphite-2.2.0.jar',
'config' => {
'graphite' => [{
'timeunit' => 'SECONDS',
'hosts' => [{
'host' => 'graphite.host.com',
'port' => 2003
}],
'prefix' => "servers.#{node.name}.cassandra",
'period' => 60,
'predicate' => {
'color' => 'white',
'useQualifiedName' => true,
'patterns' => [
'^org.apache.cassandra.metrics.Cache.+',
]
}
}]
}
}
dse.rb
-
node["cassandra"]["dse"]["delegated_snitch"]
(default:org.apache.cassandra.locator.SimpleSnitch
): the snitch to use for dse -
node["cassandra"]["dse"]["snitch"]
(default:com.datastax.bdp.snitch.DseDelegateSnitch
): the snitch to use in dse.yaml -
node["cassandra"]["dse"]["service_name"]
(default:dse
): the name of the service -
node["cassandra"]["dse"]["conf_dir"]
(default:/etc/dse
): the directory of dse config files -
node["cassandra"]["dse"]["repo_user"]
(default: ``): the datastax username for the repo -
node["cassandra"]["dse"]["repo_pass"]
(default: ``): the datastax password for the repo -
node["cassandra"]["dse"]["rhel_repo_url"]
(default:http://#{node['cassandra']['dse']['repo_user']}:#{node['cassandra']['dse']['repo_pass']}@rpm.datastax.com/enterprise
): the rhel repo -
node["cassandra"]["dse"]["debian_repo_url"]
(default:http://#{node['cassandra']['dse']['repo_user']}:#{node['cassandra']['dse']['repo_pass']}@debian.datastax.com/enterprise
): the debian repo
hadoop.rb
-
node["hadoop"]["max_heap_size"]
(default:10G
): the heap size for hadoop -
node["hadoop"]["heap_newsize"]
(default:800M
): the heap newgen size for hadoop -
node["hadoop"]["map_child_java_opts"]
(default:4G
): the size of the map child java heap -
node["hadoop"]["reduce_child_java_opts"]
(default:4G
): the size of the reduce child java heap -
node["hadoop"]["map_red_localdir"]
(default:/data/mapredlocal
): the directory to use for map/reduce -
node["hive"]["scratch_dir"]
(default:/data/hive
): the directory to use for hive -
node["hadoop"]["map_reduce_parallel_copies"]
(default:20
): the number of map reduce copies -
node["hadoop"]["mapred_tasktracker_map_tasks_max"]
(default:23
): the max number of map tasks -
node["hadoop"]["mapred_tasktracker_reduce_tasks_max"]
(default:12
): the max number of reduce tasks -
node["hadoop"]["io_sort_mb"]
(default:512M
): the size of iosort -
node["hadoop"]["io_sort_factor"]
(default:64
): the iosort factor
solr.rb
-
node["solr"]["max_heap_size"]
(default:14G
): the heap size for solr -
node["solr"]["heap_newsize"]
(default:2400M
): the newgen heap size
java.rb
These are generic java settings. Datastax recommends oracle java, so override openjdk default and download from a specific location.
* node["dse"]["manage_java"]
(default: true
): whether or not to use the java recipe to manage the java install
* node["java"]["install_flavor"]
(default: oracle
): the flavor of java to install
* node["java"]["jdk_version"]
(default: 7
): the version of java to use
* node['java']['jdk']['7']['x86_64']['url']
(default: ``): the url to get the java 7 file from
ssl.rb
This portion is under construction. SSL does not currently 100% work.
* node["cassandra"]["dse"]["cassandra_ssl_dir"]
(default: /etc/cassandra
): the directory to use for pem files
* node["cassandra"]["dse"]["password_file"]
(default: cassandra_pass.txt
): the file to store the keystore pass in
* node["cassandra"]["dse"]["internode_encyption"]
(default: none
): the encyption to use (all, dc, rack)
* node["cassandra"]["dse"]["keystore"]
(default: #{node["cassandra"]["dse"]["cassandra_ssl_dir"]}/#{node["hostname"]}.keystore
): keystore name
* node["cassandra"]["dse"]["truststore"]
(default: #{node["cassandra"]["dse"]["cassandra_ssl_dir"]}/#{node["hostname"]}.truststore
): truststore name
datastax-agent.rb
These attributes are used to conigure the datastax-agent. This is used with Datastax Opscenter.
-
node["datastax-agent"]["enabled"]
(default:false
): whether to install the datastax agent and configure -
node["datastax-agent"]["version"]
(default:4.1.1-1
): the version of the datastax agent to install -
node["datastax-agent"]["conf_dir"]
(default:/var/lib/datastax-agent/conf
): where the datastax-agent conf file is -
node["datastax-agent"]["opscenter_ip"]
(default:192.168.32.3
): the Opscenter IP to connect to
Dependencies
- java
- yum
- apt
Datastax recommends to use the Oracle jdk version. You can do this by setting an attribute in your environment or run list.
Kitchen Testing
The integration test environment consists of :
- Chef-DK 0.4.0
- VirtualBox 4.3.24
- Vagrant 1.7.2
- vagrant-omnibus
- vagrant-berkshelf
- vagrant-share
- vagrant-login
Edit the .kitchen.yml file in the root of the cookbook and set your Datastax repository username and password in order to run the tests. Run 'rake' in the root of the cookbook to test the full automated testing suite.
Copyright & License
- Author: Daniel Parker (daniel.c.parker@target.com)
- Reviewer: Eric Helgeson (erichelgeson@gmail.com)
Released under the Apache 2.0 License.
Dependent cookbooks
java ~> 1.14 |
yum ~> 3.5 |
yum-epel ~> 0.6 |
apt ~> 2.0 |
Contingent cookbooks
There are no cookbooks that are contingent upon this one.
CHANGELOG for dse cookbook
This file is used to list changes made in each version of the dse cookbook.
3.0.19
- added support for overriding additional attributes in log4j-server.properties
3.0.16
- adding metrics library
3.0.15
- update to latest yum version
3.0.14
- restart the datastax-agent on new version
3.0.13
- added support to set MaxTenuringThreshold in cassandra-env
3.0.12
- refactored some version checks
- added role based seed assignment
3.0.11
- add templates and upgrade to 4.5.2
3.0.10
- add templates and upgrade to 4.0.4
3.0.9
- remove the ssd tuning from this recipe
3.0.8
- minor updates
3.0.7
- adding ability to tune memtable thresholds
3.0.6
- removed specific tuning to move it to os-tuning cookbook
- adding ability to set concurrent compactors
3.0.5
- added thrift frame size settings for hadoop requests
3.0.4
*adding recommended datastax tuning settings
3.0.3
- adding cassandra specific gc settings
3.0.2
- more hadoop tunung
3.0.1
- adding hadoop tuning settings
3.0.0
- First pass of node-to-node ssl
- Adding another hadoop attribute
- fixing hadoop map reduce dir, as hadoop didnt create
- adding changes to ssl
- adding a version-specific dse script
- adding support to stop dse before an upgrade
- hive scratch directory support
2.3.5
- subscribed the dse service to java, so it will restart if java version changes
- added more chefspec
- moved the start of the dse service until after all the templates are set up
2.3.4
- allow support for gossipingPropertyFileSnitch
2.3.3
- Tell the OS that SSDs are present
2.3.2
- Allow Solr and Hadoop Heap to be set dynamically
2.3.1
- Allows this recipe to install the datastax-agent
2.3.0
- Allow DSE 4.0 to be installed
2.2.0
- Rename the cookbook to dse
2.1.3
- Added kitchen tests
- Added multiple data directory support
0.1.0:
- Initial release of cassandra
Check the Markdown Syntax Guide for help with Markdown.
The Github Flavored Markdown page describes the differences between markdown on github and standard markdown.