f2CBVx

ppt.cc/fVjECx ppt.cc/fEnHsx ppt.cc/fRZTnx ppt.cc/fSZ3cx ppt.cc/fLOuCx ppt.cc/fE9Nux ppt.cc/fL5Kyx ppt.cc/fIr1ax ppt.cc/f71Yqx tecmint.com linuxcool.com linux.die.net linux.it.net.cn ostechnix.com unix.com ubuntugeek.com runoob.com man.linuxde.net ppt.cc/fwpCex ppt.cc/fxcLIx ppt.cc/foX6Ux linuxprobe.com linuxtechi.com howtoforge.com linuxstory.org systutorials.com ghacks.net linuxopsys.com ppt.cc/ffAGfx ppt.cc/fJbezx ppt.cc/fNIQDx ppt.cc/fCSllx ppt.cc/fybDVx ppt.cc/fIMQxx ppt.cc/fKlBax

Friday, 1 July 2016

在谷歌云上，部署hdp

Tools for creating Hadoop and Spark clusters on Google Compute Engine. See http://cloud.google.com/hadoop for more information.

+

Hortonworks Data Platform (HDP) on Google Cloud Platform

This extension, to Google's bdutil, provides support for deploying the Hortonworks Data Platform with a single command.

The extension utilizes Apache Ambari's Blueprint Recommendations to fully configure the cluster without the need for manual configuration.

Resources

Google documentation for bdutil & Hadoop on Google Cloud Platform.
Source on Github. Open to the community and welcoming your collaboration.

Video Tutorial

Before you start

Create a Google Cloud Platform account

open https://console.developers.google.com/
sign-in or create an account
The "free trial" may be used

Create a Google Cloud Project

Open https://console.developers.google.com/
Open 'Create Project' and fill in the details.
- As an example, this document uses 'hdp-00'
Within the project, open 'APIs & auth -> APIs'. Then enable:
- Google Compute Engine
- Google Cloud Storage
- Google Cloud Storage JSON API

Configure Google Cloud SDK & Google Cloud Storage

Install Google Cloud SDK locally

Configure the SDK:

gcloud auth login                   ## authenticate to Google cloud
gcloud config set project hdp-00    ## set the default project
gsutil mb -p hdp-00 gs://hdp-00     ## create a cloud storage bucket

Download bdutil

Latest packaged: https://cloud.google.com/hadoop/
Latest sorce from GitHub: git clone https://github.com/GoogleCloudPlatform/bdutil; cd bdutil

Quick start

Set your project & bucket from above in bdutil_env.sh
Deploy or Delete the cluster: see './bdutil --help' for more details

Deploy: ./bdutil -e ambari deploy
Delete: ./bdutil -e ambari delete
- when deleting, ensure to use the same switches/configuration as the deploy

Configuration

You can deploy without setting any configuration, but you should have a look at platforms/hdp/ambari.conf

Here are some of the defaults to consider:

  GCE_ZONE='us-central1-a'           ## the zone/region to deploy in
  NUM_WORKERS=4                      ## the number of worker nodes. Total
                                     ##     is NUM_WORKERS + 1 master
  GCE_MACHINE_TYPE='n1-standard-4'   ## the machine type
  WORKER_ATTACHED_PDS_SIZE_GB=1500   ## 1500GB attached to each worker
  MASTER_ATTACHED_PD_SIZE_GB=1500    ## 1500GB attached to master

  ## The Hortonworks Data Platform services which will be installed.
  ##   This is nearly the entire stack
  AMBARI_SERVICES="ACCUMULO AMBARI_METRICS ATLAS FALCON FLUME GANGLIA HBASE HDFS
      HIVE KAFKA MAHOUT MAPREDUCE2 OOZIE PIG SLIDER SPARK SQOOP STORM TEZ YARN
      ZOOKEEPER"

  AMBARI_PUBLIC=false                ## Services listed on internal
                                     ##   hostname not public IP. Need
                                     ##   a socks proxy or tunnel to access

Use the cluster

SSH

You'll have immediate SSH access with: ./bdutil shell
Or update your SSH config with: gcloud compute config-ssh

Access Ambari & other services

a. With a local socks proxy:

  ./bdutil socksproxy             # opens a socks proxy to the cluster at localhost:1080

  # I use the Chrome extension 'Proxy SwitchySharp' to automatically detect when connecting to Google Compute
  open http://hadoop-m:8080/      # My Google Chrome has an extension which automatically uses the proxy

b. Or a local SSH tunnel

  gcloud compute config-ssh                  # updates our SSH config for direct SSH access to all nodes
  ssh -L 8080:127.0.0.1:8080 hadoop-m  <TAB> # quick tunnel to Apache Ambari
  open http://localhost:8080/                # open Ambari in your browser

c. Or open a firewall rule from the Google Cloud Platform control panel

Use the cluster

You now have a full HDP cluster. If you are new to Hadoop check the tutorials at http://hortonworks.com/.

For command-line based jobs, 'bdutil' gives methods for passing through commands:https://cloud.google.com/hadoop/running-a-mapreduce-job

For example: ./bdutil shell < ./extensions/google/gcs-validate-setup.sh

Questions

Can I set/override Hadoop configurations during deployment?

For adding/overriding Hadoop configurations, update configuration.json and then use the extension as documented. And contribute back if you think the defaults should be changed.

Can I deploy HDP manually using Ambari and/or use my own Ambari Blueprints?

Yes. Set ambari_manual_env.sh as your environment (with the -e switch) instead of ambari_env.sh. That will configure Ambari across the cluster & handle all HDP prerequisites, but not trigger the Ambari Blueprints which install HDP.

After manually deploying your cluster, you can use ./bdutil <YOUR_FLAGS> -e platforms/hdp/ambari_manual_post_deploy_env.sh run_command_steps to configure HDFS directories and install the GCS connector. Note it uses run_command_steps instead of deploy.

Can I re-use the attached persistent disk(s) across deployments?

bdutil supports keeping persistent disks (aka ATTACHED_PDS) online when deleting machines. It can then deploy a new cluster using the same disks without lose of data, assuming the number of workers is the same.

The basic commands are below. Find more detail in TEST.md.

## deploy the cluster & create disks
./bdutil -e ambari deploy

## delete the cluster but don't delete the disks
export DELETE_ATTACHED_PDS_ON_DELETE=false
./bdutil -e ambari delete

## create with existing disks
export CREATE_ATTACHED_PDS_ON_DEPLOY=false
./bdutil -e ambari deploy

Another would be to use gs:// (Google Cloud Storage) instead of hdfs:// in your Hadoop jobs, even setting it as the default. Or backup HDFS to Google Cloud Storage before cluster deletion.

Note: Hortonworks can't guarantee the safety of data throughout this process. You should always take care when manipulating disks and have backups where necessary.

What are the built-in storage options?

By default, HDFS is on attached disks ('pd-standard' or 'pd-ssd').

the size and type can be set in ambari.conf

The rest of the system resides on the local boot disk, unless configured otherwise.

Google Cloud Storage is also available with gs://. It can be used anywhere that hdfs:// is available, such as but not limited to mapreduce & hadoop fs operations.

Note: Adding an additional slash (gs:///) will allow you to use the default bucket (defined at cluster build) without needing to specific it.

Can I deploy in the Google Cloud Platform Free Trial ?

You may use bdutil with HDP by lowering the machine type & count below the recommended specifications. To use the default configuration, upgrade the account from a free trial.

In 'platforms/hdp/ambari.conf':
- GCE_MACHINE_TYPE='n1-standard-2'
- WORKERS=3 # or less
Or at the command-line provide these switches to the 'deploy' & 'delete':
- Deploy cluster: -n 3 -m n1-standard-2

Known Issues

Feedback & Issues

http://github.com/seanorama/bdutil/

from https://github.com/GoogleCloudPlatform/bdutil/blob/master/platforms/hdp/README.md

Total Pageviews

Friday, 1 July 2016

在谷歌云上，部署hdp

+

Hortonworks Data Platform (HDP) on Google Cloud Platform

Resources

Video Tutorial

Before you start

Create a Google Cloud Platform account

Create a Google Cloud Project

Configure Google Cloud SDK & Google Cloud Storage

Download bdutil

Quick start

Configuration

Use the cluster

SSH

Access Ambari & other services

Use the cluster

Questions

Can I set/override Hadoop configurations during deployment?

Can I deploy HDP manually using Ambari and/or use my own Ambari Blueprints?

Can I re-use the attached persistent disk(s) across deployments?

What are the built-in storage options?

Can I deploy in the Google Cloud Platform Free Trial ?

Known Issues

Feedback & Issues