Adoptable Cookbooks List

Looking for a cookbook to adopt? You can now see a list of cookbooks available for adoption!
List of Adoptable Cookbooks

Supermarket Belongs to the Community

Supermarket belongs to the community. While Chef has the responsibility to keep it running and be stewards of its functionality, what it does and how it works is driven by the community. The chef/supermarket repository will continue to be where development of the Supermarket application takes place. Come be part of shaping the direction of Supermarket by opening issues and pull requests or by joining us on the supermarket mailing list.

Select Badges

Select Supported Platforms

RSS

hadoop (55) Versions 2.4.1

Installs/Configures Hadoop (HDFS/YARN/MRv2), HBase, Hive, Flume, Oozie, Pig, Spark, Storm, Tez, and ZooKeeper

Berkshelf
Librarian
Knife
cookbook 'hadoop', '~> 2.4.1'
cookbook 'hadoop', '~> 2.4.1'
knife cookbook site install hadoop
knife cookbook site download hadoop
README
Dependencies
Changelog
Quality

hadoop cookbook

Cookbook Version Apache License 2.0 Build Status Code Climate

Requirements

This cookbook may work on earlier versions, but these are the minimal tested versions.

  • Chef 11.4.0+
  • CentOS 6.4+
  • Debian 6.0+
  • Ubuntu 12.04+

This cookbook assumes that you have a working Java installation. It has been tested using version 1.21.2 of the java cookbook, using Oracle Java 6. If you plan on using Hive with a database other than the embedded Derby, you will need to provide it and set it up prior to starting Hive Metastore service.

Usage

This cookbook is designed to be used with a wrapper cookbook or a role with settings for configuring Hadoop. The services should work out of the box on a single host, but little validation is done that you have made a working Hadoop configuration. The cookbook is attribute-driven and is suitable for use via either chef-client or chef-solo since it does not use any server-based functionality. The cookbook defines service definitions for each Hadoop service, but it does not enable or start them, by default.

For more information, read the Wrapping this cookbook wiki entry.

Attributes

Attributes for this cookbook define the configuration files for Hadoop and its various services. Hadoop configuration files are XML files, with name/value property pairs. The attribute name determines which file the property is placed and the property name. The attribute value is the property value. The attribute hadoop['core_site']['fs.defaultFS'] will configure a property named fs.defaultFS in core-site.xml in hadoop['conf_dir']. All attribute values are taken as-is and only minimal configuration checking is done on values. It is up to the user to provide a valid configuration for your cluster.

Attribute Tree File Location
flume['flume_conf'] flume.conf flume['conf_dir']
hadoop['capacity_scheduler'] capacity-scheduler.xml hadoop['conf_dir']
hadoop['container_executor'] container-executor.cfg hadoop['conf_dir']
hadoop['core_site'] core-site.xml hadoop['conf_dir']
hadoop['fair_scheduler'] fair-scheduler.xml hadoop['conf_dir']
hadoop['hadoop_env'] hadoop-env.sh hadoop['conf_dir']
hadoop['hadoop_metrics'] hadoop-metrics.properties hadoop['conf_dir']
hadoop['hadoop_policy'] hadoop-policy.xml hadoop['conf_dir']
hadoop['hdfs_site'] hdfs-site.xml hadoop['conf_dir']
hadoop['log4j'] log4j.properties hadoop['conf_dir']
hadoop['mapred_env'] mapred-env.sh hadoop['conf_dir']
hadoop['mapred_site'] mapred-site.xml hadoop['conf_dir']
hadoop['yarn_env'] yarn-env.sh hadoop['conf_dir']
hadoop['yarn_site'] yarn-site.xml hadoop['conf_dir']
hbase['hadoop_metrics'] hadoop-metrics.properties hbase['conf_dir']
hbase['hbase_env'] hbase-env.sh hbase['conf_dir']
hbase['hbase_policy'] hbase-policy.xml hbase['conf_dir']
hbase['hbase_site'] hbase-site.xml hbase['conf_dir']
hbase['jaas'] jaas.conf hbase['conf_dir']
hbase['log4j'] log4j.properties hbase['conf_dir']
hive['hive_env'] hive-env.sh hive['conf_dir']
hive['hive_site'] hive-site.xml hive['conf_dir']
hive['jaas'] jaas.conf hive['conf_dir']
oozie['oozie_env'] oozie-env.sh oozie['conf_dir']
oozie['oozie_site'] oozie-site.xml oozie['conf_dir']
spark['log4j'] log4j.properties spark['conf_dir']
spark['metrics'] metrics.properties spark['conf_dir']
spark['spark_env'] spark-env.sh spark['conf_dir']
storm['storm_env'] storm-env.sh storm['conf_dir']
storm['storm_env'] storm_env.ini storm['conf_dir']
storm['storm_conf'] storm.yaml storm['conf_dir']
tez['tez_env'] tez-env.sh tez['conf_dir']
tez['tez_site'] tez-site.xml tez['conf_dir']
zookeeper['jaas'] jaas.conf zookeeper['conf_dir']
zookeeper['log4j'] log4j.properties zookeeper['conf_dir']
zookeeper['zoocfg'] zoo.cfg zookeeper['conf_dir']

Distribution Attributes

  • hadoop['distribution'] - Specifies which Hadoop distribution to use, currently supported: cdh, hdp, bigtop. Default hdp
  • hadoop['distribution_version'] - Specifies which version of hadoop['distribution'] to use. Default 2.0 if hadoop['distribution'] is hdp, 5 if hadoop['distribution'] is cdh, and 0.8.0 if hadoop['distribution'] is bigtop. It can also be set to develop when hadoop['distribution'] is bigtop to allow installing from development repos without gpg validation.

APT-specific settings

  • hadoop['apt_repo_url'] - Provide an alternate apt installation source location. If you change this attribute, you are expected to provide a path to a working repo for the hadoop['distribution'] used. Default: nil
  • hadoop['apt_repo_key_url'] - Provide an alternative apt repository key source location. Default nil

RPM-specific settings

  • hadoop['yum_repo_url'] - Provide an alternate yum installation source location. If you change this attribute, you are expected to provide a path to a working repo for the hadoop['distribution'] used. Default: nil
  • hadoop['yum_repo_key_url'] - Provide an alternative yum repository key source location. Default nil

Global Configuration Attributes

  • hadoop['conf_dir'] - The directory used inside /etc/hadoop and used via the alternatives system. Default conf.chef
  • hbase['conf_dir'] - The directory used inside /etc/hbase and used via the alternatives system. Default conf.chef
  • hive['conf_dir'] - The directory used inside /etc/hive and used via the alternatives system. Default conf.chef
  • oozie['conf_dir'] - The directory used inside /etc/oozie and used via the alternatives system. Default conf.chef
  • tez['conf_dir'] - The directory used inside /etc/tez and used via the alternatives system. Default conf.chef
  • spark['conf_dir'] - The directory used inside /etc/spark and used via the alternatives system. Default conf.chef
  • storm['conf_dir'] - The directory used inside /etc/storm and used via the alternatives system. Default conf.chef
  • zookeeper['conf_dir'] - The directory used inside /etc/zookeeper and used via the alternatives system. Default conf.chef

Default Attributes

  • hadoop['core_site']['fs.defaultFS'] - Sets URI to HDFS NameNode. Default hdfs://localhost
  • hadoop['yarn_site']['yarn.resourcemanager.hostname'] - Sets hostname of YARN ResourceManager. Default localhost
  • hive['hive_site']['javax.jdo.option.ConnectionURL'] - Sets JDBC URL. Default jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true
  • hive['hive_site']['javax.jdo.option.ConnectionDriverName'] - Sets JDBC Driver. Default org.apache.derby.jdbc.EmbeddedDriver

Recipes

  • default.rb - Sets up configuration and hadoop-client packages.
  • hadoop_hdfs_checkconfig - Ensures the HDFS configuration meets required parameters.
  • hadoop_hdfs_datanode - Sets up an HDFS DataNode.
  • hadoop_hdfs_ha_checkconfig - Ensures the HDFS configuration meets requirements for High Availability.
  • hadoop_hdfs_journalnode - Sets up an HDFS JournalNode.
  • hadoop_hdfs_namenode - Sets up an HDFS NameNode.
  • hadoop_hdfs_secondarynamenode - Sets up an HDFS Secondary NameNode.
  • hadoop_hdfs_zkfc - Sets up HDFS Failover Controller, required for automated NameNode failover.
  • hadoop_yarn_nodemanager - Sets up a YARN NodeManager.
  • hadoop_yarn_proxyserver - Sets up a YARN Web Proxy.
  • hadoop_yarn_resourcemanager - Sets up a YARN ResourceManager.
  • hbase - Sets up configuration and hbase packages.
  • hbase_checkconfig - Ensures the HBase configuration meets required parameters.
  • hbase_master - Sets up an HBase Master.
  • hbase_regionserver - Sets up an HBase RegionServer.
  • hbase_rest - Sets up an HBase REST interface.
  • hbase_thrift - Sets up an HBase Thrift interface.
  • hive - Sets up configuration and hive packages.
  • hive_metastore - Sets up Hive Metastore metadata repository.
  • hive_server - Sets up a Hive Thrift service.
  • hive_server2 - Sets up a Hive Thrift service with Kerberos and multi-client concurrency support.
  • oozie - Sets up an Oozie server.
  • oozie_client - Sets up an Oozie client.
  • pig - Installs pig interpreter.
  • repo - Sets up package manager repositories for specified hadoop['distribution']
  • spark - Sets up configuration and spark-core packages.
  • spark_master - Sets up a Spark Master.
  • spark_worker - Sets up a Spark Worker.
  • storm - Sets up storm package.
  • storm_nimbus - Setups a Storm Nimbus server.
  • storm_supervisor - Setups a Storm Supervisor server.
  • storm_ui - Setups a Storm UI server.
  • tez - Sets up configuration and tez packages.
  • zookeeper - Sets up zookeeper package.
  • zookeeper_server - Sets up a ZooKeeper server.

Author

Author:: Cask Data, Inc. (ops@cask.co)

Testing

This cookbook has several ways to test it. It includes code tests, which are done using foodcritic, rubocop, and chefspec. It, also, includes functionality testing, provided by kitchen.

rake chefspec     # Run RSpec code examples
rake foodcritic   # Foodcritic linter
rake integration  # Run Test Kitchen integration tests
rake metadata     # Create metadata.json from metadata.rb
rake rubocop      # Ruby style guide linter
rake share        # Share cookbook to community site

This cookbook requires the vagrant-omnibus and vagrant-berkshelf Vagrant plugins to be installed.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

hadoop CHANGELOG

v2.4.1 (Aug 1, 2016)

  • Support for HDP 2.4.2.0 ( Issue: #270 )

v2.4.0 (Jul 27, 2016)

  • Update ext.js download URL to Cloudera, per @jeremyhahn ( Issue: #265 )
  • Restrict Gem versions on older Ruby ( Issue: #266 )
  • Set ZOOPIDFILE ( Issues: #267 COOK-105 )
  • Split client and service JAAS files ( Issues: #268 COOK-106 )

v2.3.3 (Jun 16, 2016)

  • Setting YARN_LOG_DIR is a redundant test, use text-based ( Issue: #261 )
  • ZooKeeper filesystem objects should use zookeeper group ( Issues: #262 COOK-100 )
  • Use upstream ulimit cookbook for testing ( Issue: #263 )

v2.3.2 (May 6, 2016)

  • Improve test coverage ( Issue: #258 )
  • HDP 2.2+ log directories are not modified on Ubuntu ( Issues: #259 COOK-96 )

v2.3.1 (Apr 19, 2016)

  • Allow overriding distribution_version at default ( Issues: #256 COOK-93 )
  • Set SPARK_DIST_CLASSPATH and redirect STDERR to logs ( Issue: #257 )

v2.3.0 (Apr 12, 2016)

  • Change spark-defaults from XML to .conf property file ( Issue: #241 )
  • Update default versions: HDP 2.3.4.7, CDH 5.6.0 ( Issue: #242 )
  • Support HDP 2.3.4.7 and 2.4.0.0 ( Issue: #250 )
  • Create hadoop_package helper for ODP-based distributions ( Issue: #251 )
  • Revert change to zookeeper_server recipe from #251 ( Issue: #252 )
  • Use oozie for service name, not pkg variable from #251 ( Issue: #253 )
  • Remove unecessary inclusion of yum::default ( Issue: #254 )

v2.2.1 (Feb 24, 2016)

  • Support for HDP 2.3.4.0 per @kriszentner ( Issue: #243 )
  • Style updates ( Issues: #244 #247 )
  • Support for Bigtop 1.0.0 relocated repositories ( Issue: #245 )
  • Fix to SPARK_HOME in the init scripts ( Issues: #248 )

v2.2.0 (Dec 16, 2015)

  • Add Code Climate badge ( Issue: #232 )
  • Syntax fix in metadata.rb per @mrjefftang ( Issue: #234 )
  • Fix up ImmutableMash/Array for older Chef per @mrjefftang ( Issue: #235 )
  • Support Ubuntu 14 and Debian 7 for HDP 2.3.2.0+ per @kriszentner ( Issue: #236 )
  • Support HDP 2.2.9.0 ( Issue: #237 )
  • Revert #230 - Init scripts should use ampersand ( Issue: #238 )
  • Fix Hive init scripts ( Issue: #239 )

v2.1.0 (Dec 7, 2015)

  • Add support for Apache Storm ( Issue: #223 )
  • Support Bigtop 1.0.0 ( Issue: #224 )
  • Update minimum apt cookbook dependency ( Issue: #227 )
  • Support HDP 2.3.2.0 ( Issues: #228 COOK-76 )
  • Update Gemfile dependencies ( Issue: #229 )
  • Init scripts should use ampersand ( Issue: #230 )
  • Update foodcritic constraint ( Issue: #231 )
  • Reserve Hadoop ports from being used as local ports ( Issues: #233 COOK-79 )

v2.0.9 (Sep 16, 2015)

  • Support later HDP 2.1 and HDP 2.2 updates on Ubuntu ( Issue: #225 )

v2.0.8 (Sep 15, 2015)

  • Fix Hive init scripts, per @QuentinFra ( Issue: #220 )
  • Correct JSVC_HOME for HDP 2.0 ( Issues: #221 COOK-70 )
  • Support HDP 2.2.8.0 ( Issue: #222 )

v2.0.7 (Aug 21, 2015)

  • Fix Hive sql_connector jar on Ubuntu ( Issues: #216 COOK-65 )
  • Style updates ( Issue: #217 )
  • Set Yarn increment-allocation appropriately for Fair Scheduler ( Issues: #218 COOK-67 )

v2.0.6 (Jul 30, 2015)

  • Fix Spark CONF_DIR ( Issue: #215 )

v2.0.5 (Jul 30, 2015)

  • Support HDP 2.2.6.3 ( Issue: #212 )
  • Keep HADOOP_CLASSPATH before Tez's CLASSPATH ( Issue: #213 )
  • Support HDP 2.3.0.0 ( Issue: #214 )

v2.0.4 (Jul 23, 2015)

  • Fix ChefSpec ( Issue: #207 )
  • Support HDP 2.1.15.0, 2.2.4.4, and 2.2.6.0 ( Issue: #208 )
  • HiveServer2 process fix per @jsh2134 ( Issue: #210 )
  • Fix HDP 2.2 yarn.application.classpath ( Issue: #211 )

v2.0.3 (Jun 25, 2015)

  • Config files should be root owned ( Issue: #204 )
  • Fix disable THP Compaction ( Issues: #205 COOK-57 )
  • Fix init for EXE_ARGS ending in ampersand ( Issues: #206 COOK-59 )

v2.0.2 (Jun 12, 2015)

  • Don't make /etc/default files executable ( Issue: #201 )
  • Remove Vagrantfile ( Issue: #202 )
  • Fix Ubuntu init ( Issue: #203 )

v2.0.1 (Jun 9, 2015)

  • Supply /etc/default/hbase for hbase binary ( Issue: #200 )

v2.0.0 (Jun 8, 2015)

  • Transparent Hugepages are not universally available, per @jdecello and @taverentech ( Issue: #156 )
  • Support HDP 2.2.4.2 repo ( Issues: #160 #186 )
  • Fix YARN/Hive/Oozie PATHs for HDP 2.2 ( Issue: #161 )
  • Official CDH5 repo for Trusty ( Issue: #162 )
  • Set user limits by attribute ( Issues: #163 #165 COOK-35 )
  • Fix extjs link ( Issues: #164 COOK-36 )
  • Use HDP mysql-connector-java ( Issues: #166 COOK-34 )
  • Deprecate short versions ( Issue: #167 )
  • Correct status for #156 ( Issue: #168 )
  • Move SQL connectors to their own recipe ( Issue: #169 )
  • Testing updates ( Issues: #170 #171 )
  • Use Chef only_if guards over Ruby conditionals ( Issues: #172 #175 #176 #181 )
  • Disable SELinux ( Issue: #173 )
  • Install libhdfs ( Issue: #177 )
  • Support HDP 2.1.10.0 and 2.2.1.0 ( Issue: #178 )
  • Move compression libs to helper recipe ( Issues: #179 #187 COOK-44 )
  • Ensure zookeeper user has shell access ( Issue: #180 )
  • Use variables directly over local variable ( Issue: #181 )
  • HDP 2.2 MR DistributedCache ( Issue: #182 COOK-40 )
  • HDP 2.2 Tez DistributedCache ( Issue: #183 COOK-49 )
  • Sort XML configuration keys, per @mbautin ( Issue: #184 )
  • HDP 2.2 hadooplzo support ( Issue: #185 )
  • Fix Java 7 type checking, per @TD-4242 ( Issue: #188 )
  • Template-based init scripts ( Issues: #190 #194 #195 #196 COOK-52 COOK-53 )
  • Set debian repository priority ( Issues: #191 #198 )
  • Fix HDFS HA checkconfig, per @TD-4242 ( Issue: #192 )
  • Initialize ZooKeeper version-2 directories ( Issue: #193 )
  • Support hadoop-metrics2.properties ( Issue: #197 )
  • Remove guard on execute block with action :nothing ( Issue: #199 )

v1.13.1 (Apr 15, 2015)

  • Fix YARN AM staging dir ( Issues: #157 COOK-30 )
  • Support HDP 2.0.13.0 and bump HDP-UTILS to 1.1.0.20 ( Issue: #158 )
  • Document issue tracker location ( Issues: #159 COOK-32 )

v1.13.0 (Mar 31, 2015)

  • Enable system tuning ( Issue: #148 )
  • Test against more Ruby versions ( Issue: #153 )
  • Fix guard on mapreduce.jobhistory.done-dir ( Issue: #154 )

v1.12.0 (Mar 20, 2015)

  • Support yarn.app.mapreduce.am.staging-dir ( Issue: #150 )
  • Support mapreduce.jobhistory.done-dir and mapreduce.jobhistory.intermediate-done-dir ( Issue: #151 )
  • Tests for #135 and #150 ( Issue: #152 )

v1.11.2 (Mar 9, 2015)

  • Prefix internal recipes with underscore ( Issue: #147 )
  • Fix Java 7 check ( Issues: #149 COOK-27 )

v1.11.1 (Feb 27, 2015)

  • Packaging fix

v1.11.0 (Feb 27, 2015)

  • Stop packages from auto-starting on install ( Issues: #145 COOK-26 )
  • Fail fast on invalid distribution ( Issues: #146 COOK-25 )

v1.10.1 (Feb 24, 2015)

  • HDP Repo fix ( Issues: #144 COOK-24 )

v1.10.0 (Feb 24, 2015)

  • Enforce Java 7 or higher on CDH 5.3 ( Issues: #140 COOK-18 )
  • Default hive.metastore.uris ( Issues: #141 COOK-19 )
  • HDP 2.2 support ( Issues: #142 COOK-16 )
  • Recursive deletes on log dirs ( Issue: #143 COOK-23 )

v1.9.2 (Jan 8, 2015)

  • Defaults for log4j ( Issue: #139 )

v1.9.1 (Dec 9, 2014)

  • Spark tests for #129 ( Issue: #133 )
  • Improve *_LOG_DIR symlink handling ( Issue: #134 )
  • Fix PATH to jsvc in /etc/default/hadoop ( Issue: #135 )

v1.9.0 (Dec 8, 2014)

  • Tez support from @mandrews ( Issues: #127 #132 )

v1.8.1 (Dec 8, 2014)

  • Ubuntu Trusty support for CDH5 ( Issue: #128 )
  • Spark MLib requires libgfortran.so.3 ( Issue: #129 )
  • Simplify container-executor.cfg ( Issue: #130 )
  • Minor spark fixes from @pauloricardomg ( Issue: #131 )

v1.8.0 (Nov 24, 2014)

  • Opportunistic creation of hive.exec.local.scratchdir ( Issue: #117 )
  • Only use hadoop::repo for Hive ( Issue: #120 )
  • More Oozie tests ( Issue: #121 )
  • Only test hadoop::default in Vagrant ( Issue: #122 )
  • Avro libraries/tools support ( Issue: #123 COOK-6 )
  • Parquet support ( Issue: #124 COOK-7 )
  • Improve version matching for HDP 2.1 ( Issue: #125 )
  • Initial Spark support ( Issue: #126 )

v1.7.1 (Nov 5, 2014)

  • Initial Oozie tests ( Issue: #118 )
  • Hotfix symlink log dirs ( Issue: #119 )

v1.7.0 (Nov 5, 2014)

  • Use Java 7 by default ( Issue: #108 COOK-5 )
  • Use HDP 2.1 by default ( Issue: #109 )
  • Update tests ( Issues: #110 #111 #114 #115 #116 )
  • Symlink default log dirs to new locations ( Issue: #113 )

v1.6.1 (Oct 16, 2014)

  • Update Bigtop to 0.8.0 release ( Issues: #106 #107 COOK-1 )

v1.6.0 (Oct 16, 2014)

  • Add Bigtop support ( Issue: #105 COOK-1 )

v1.5.0 (Sep 25, 2014)

This release adds Flume support to the cookbook.

  • Update test-kitchen to use more recipes ( Issue: #95 )
  • Test improvements ( Issues: #98 #100 #101 )
  • Flume support ( Issue: #99 )
  • Simplify RHEL handling ( Issue: #102 )

v1.4.1 (Sep 18, 2014)

  • Add zookeeper group after package installs ( Issue: #96 )
  • Code consistency updates ( Issue: #97 )

v1.4.0 (Sep 18, 2014)

  • Support Amazon Linux ( Issues: #84 #90 )
  • Remove addition of zookeeper user/group ( Issue: #87 )
  • Add support for HDP 2.1.5.0 ( Issue: #88 )
  • Update HDP-UTILS to 1.1.0.19 ( Issue: #89 )
  • Use next to break loops ( Issue: #91 )
  • Hive HDFS directories use hive group ( Issue: #92 )
  • Add Hive spec tests ( Issue: #93 )
  • Update java cookbook dependency ( Issue: #94 )

There is no CHANGELOG for versions of this cookbook prior to 1.4.0 release.

Foodcritic Metric

FC053: Metadata uses the unimplemented "recommends" keyword: /tmp/0852c4d69e38219f8c5953ec/hadoop/metadata.rb:16
FC064: Ensure issues_url is set in metadata: /tmp/0852c4d69e38219f8c5953ec/hadoop/metadata.rb:1
FC065: Ensure source_url is set in metadata: /tmp/0852c4d69e38219f8c5953ec/hadoop/metadata.rb:1

Collaborators Metric

2.4.1 passed the Collaborator Metric with 5 collaborators.