Adoptable Cookbooks List

Looking for a cookbook to adopt? You can now see a list of cookbooks available for adoption!
List of Adoptable Cookbooks

Supermarket Belongs to the Community

Supermarket belongs to the community. While Chef has the responsibility to keep it running and be stewards of its functionality, what it does and how it works is driven by the community. The chef/supermarket repository will continue to be where development of the Supermarket application takes place. Come be part of shaping the direction of Supermarket by opening issues and pull requests or by joining us on the Chef Mailing List.

Select Badges

Select Supported Platforms

RSS

hadoop_cluster_rpm (1) Versions 0.9.0

Unmerged RPM-distro additions for hadoop_cluster

Berkshelf/Librarian
Policyfile
Knife
cookbook 'hadoop_cluster_rpm', '~> 0.9.0'
cookbook 'hadoop_cluster_rpm', '~> 0.9.0', :supermarket
knife cookbook site install hadoop_cluster_rpm
knife cookbook site download hadoop_cluster_rpm
README
Dependencies
Quality 33%

Hadoop Cookbook

This cookbook is a work in progress. It's essentially the second version I've put together, after learning a bit about what sorts of bad assumptions I was making with our clusters and how they had been set up. I still have some things on my radar, like monitoring and quotas on the datanodes.

The changes I've made here are exclusively for rpm-based systems. I plan to offer these changes to the maintainer of the HadoopCluster cookbook to expand that cookbook, since it currently only supports debian and ubuntu, and simply roll them together. I haven't worked off that version since it's significantly different from the environment I've been working on this in.

The templates are not exhaustively complete config files, but I have included links to the hadoop documentation for all options. From a functional standpoint, the main components are here, and would hopefully only require minor changes to get running in any given environment.

I've tried to keep the recipes clean from the standpoint of being able to run multiple clusters with this same cookbook, setting up the attributes necessary for new clusters. I do have a ToDo to look at cleaning that part up and using the environments in a smarter way or potentially putting things in a databag.

I have my own list of open issues in github for this project. Feel free to comment, add new ones, close, or whatever.

Some additional points:

  • this makes a mess of the existing Debian / Ubuntu stuff in the current community cookbook. the HadoopCluster cookbook is similar to this one and only supports Debian and Ubuntu.

  • includes support for RHEL and CentOS. Any other RPM-based platform could be added, I just don't have the version numbers for what would work with the current hadoop releases.

  • sets up hadoop based on the ops suggestions in "Hadoop The Definitive Guide" by Tom White (ISBN: 978-1-449-38973-4) and what we're doing at Admeld

  • includes several specific recipes for explicit resource management

Default recipe

  • requires the java and yum cookbooks

  • sets up the dependencies for the Cloudera RPM repository, the .repo file and the RPM keys

  • installs the base hadoop package, assuming hadoop-0.20

Apache_hadoop recipe

  • pulls the tar files from apaches repo, rather than prebuilt rpms. uses /usr/lib/hadoop as the location of the install to correspond with the cloudera rpms, but that could be changed.

  • This doesn't include start scripts or anything really fancy. I just added it as an alternative to the cloudera packages. If you choose to use it, read through the recipe and change things like the mirror you're using and the file versions.

Namenode recipe

  • installs the namenode and secondarynamenode packages

  • runs a couple of templated files out with settings in the attributes right now

Jobtracker recipe

  • installs the jobtracker package

Worker recipe

  • installs the tasktracker and datanode packages, as hadoop datanodes should always also be task nodes

Hadoop_user recipe

  • creates a hadoop user to own the files, hold the ssh keys for communicating in the cluster, and run the java processes

  • cloudera's packages also use a mapred user and a hdfs user. they are installed with the rpms, but their responsibilities are set in /etc/default/hadoop-0.20. For this version, I've replaced them to streamline the permissions on all of the directories.

work to do

  • potentially set up the services so they can be called by chef runs when config files change. not sure i would necessarily make use of it that way for the namenode and secondarynamenode. The tasktracker and datanode processes should be ok to do that with though.

  • potentially add a data bag to allow for locking down of the specific hadoop version, or otherwise rework how the attributes are set up. jtimberman recommends looking at the aws cookbook, specifically the ebs_volume stuff

  • dealing with the ssh keys for the hadoop user. there is some skeleton code there in the user recipe now.

  • debian / ubuntu WAT. see note above.

Dependent cookbooks

yum >= 0.0.0
java >= 0.0.0

Contingent cookbooks

There are no cookbooks that are contingent upon this one.

Collaborator Number Metric
            

0.9.0 failed this metric

Failure: Cookbook has 0 collaborators. A cookbook must have at least 2 collaborators to pass this metric.

Contributing File Metric
            

0.9.0 failed this metric

Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a CONTRIBUTING.md file

Foodcritic Metric
            

0.9.0 failed this metric

FC064: Ensure issues_url is set in metadata: hadoop_cluster_rpm/metadata.rb:1
FC065: Ensure source_url is set in metadata: hadoop_cluster_rpm/metadata.rb:1
FC066: Ensure chef_version is set in metadata: hadoop_cluster_rpm/metadata.rb:1
FC069: Ensure standardized license defined in metadata: hadoop_cluster_rpm/metadata.rb:1
Run with Foodcritic Version 12.2.1 with tags metadata,correctness ~FC031 ~FC045 and failure tags any

License Metric
            

0.9.0 passed this metric

No Binaries Metric
            

0.9.0 failed this metric

Failure: Cookbook should not contain binaries. Found:
hadoop_cluster_rpm/hadoop_cluster_rpm.tar

Publish Metric
            

0.9.0 passed this metric

Supported Platforms Metric
            

0.9.0 passed this metric

Testing File Metric
            

0.9.0 failed this metric

Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must contain a TESTING.md file

Version Tag Metric
            

0.9.0 failed this metric

Failure: To pass this metric, your cookbook metadata must include a source url, the source url must be in the form of https://github.com/user/repo, and your repo must include a tag that matches this cookbook version number