Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark。
Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark. It automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption. Its goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.
For more information on Dr. Elephant, check the wiki pages here.
For quick setup instructions: Click here
Developer guide: Click here
Administrator guide: Click here
User guide: Click here
Engineering Blog: Click here
from https://github.com/linkedin/dr-elephant
----------
Step 1: Create an account on github and fork the Dr. Elephant project.
Step 2: Checkout the code.
$> git clone https://github.com/<username>/dr-elephant
$> cd dr-elephant*
Step 3: Prerequisites:
- You must have play or activator command installed. Download the activator zip from https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip, unzip it and add the activator command to your $PATH. For older versions of Play, you need to add the play command instead of activator.
export ACTIVATOR_HOME=/path/to/unzipped/activator
export PATH=$ACTIVATOR_HOME/bin:$PATH
- Dr. Elephant stores the analyzed results in a MySQL database. Please install and setup mysql if you do not have it yet. (Recommend version 5.5+)
- (Optional, but recommended) In order to use the new Dr. Elephant UI, you need to install npm and dependencies
sudo yum install npm
sudo npm install -g bower
cd web; bower install; cd ..
- Lastly, you should have Hadoop and/or spark already setup.
Step 4: (Optional, Beta Phase) Please follow the below steps if you wish to try out the auto-tuning feature. (More details: https://github.com/linkedin/dr-elephant/wiki/Auto-Tuning)
- Enable it by setting the value of property
autotuning.enabled
totrue
inapp-conf/AutoTuningConf.xml
- Install python with version 2.6+
- If you want to use a python installation other than the one set in environment:
- Either set
PYTHON_PATH
to the path of desired python executable:$> export PYTHON_PATH=/path/to/python/executable
- Or, uncomment and set the value of optional property
python.path
to the path of desired python executable inapp-conf/AutoTuningConf.xml
- Either set
- Install
inspyred
package by executing:sudo pip install inspyred
- If pip is missing, it can be installed from https://pip.pypa.io/en/stable/installing/
Step 5: Compile Dr. Elephant code and generate the zip. Compile.sh script optionally takes a configuration file which includes the version of Hadoop and Spark to compile with. For instructions check the Developer Guide.
$> ./compile.sh [./compile.conf]
After compiling, the distribution is created under dist directory.
$> ls dist
dr-elephant*.zip
Step 6: Copy the distribution file to the machine where you want to deploy Dr. Elephant.
Step 7: On the machine where you want to deploy Dr. Elephant, make sure the below env variables are set.
$> export HADOOP_HOME=/path/to/hadoop/home
$> export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
$> export SPARK_HOME=/path/to/spark/home
$> export SPARK_CONF_DIR=/path/to/conf
Add hadoop to the system path because Dr. Elephant uses 'hadoop classpath' to load the right classes.
$> export PATH=$HADOOP_HOME/bin:$PATH
Step 8: You also need a backend to save the data. Configure the mysql database in the elephant.conf file.
# Database configuration
db_url=localhost
db_name=drelephant
db_user=root
db_password=""
Step 9:(Optional) If you want to have SSL enabled Dr.Elephant then add these confs in elephant.conf
# SSL related configuration
https_port=8090(any port you can configure)
https_keystore_location="/path/to/keystore"
https_keystore_type=TYPE_OF_KEYSTORE(for instance JKS)
https_keystore_password="password_for_keystore"
Step 10: If your cluster is kerberised, then update the keytab user and the keytab file location in the elephant.conf file.
Step 11: If you are running Dr. Elephant for the first time, you need to enable evolutions. To do so append(or uncomment jvm_props) -Devolutionplugin=enabled and -DapplyEvolutions.default=true to jvm_props in elephant.conf file. This will automatically create the mysql tables for you. Remember to disable the evolutions when you restart Dr. Elephant the next time.
$> vim ./app-conf/elephant.conf
jvm_props=" -Devolutionplugin=enabled -DapplyEvolutions.default=true"
Step 12: To start dr-elephant, run the start script specifying a path to the application's configuration files.
$> /bin/start.sh /path/to/app-conf/directory
To verify if Dr. Elephant started correctly, check the dr.log file.
$> less $DR_RELEASE/dr.log
...
play - database [default] connected at jdbc:mysql://localhost/drelephant?characterEncoding=UTF-8
application - Starting Application...
play - Application started (Prod)
play - Listening for HTTP on /0:0:0:0:0:0:0:0:8080
To verify if Dr. Elephant is analyzing jobs correctly correctly check the dr-elephant.log file.
$> less $DR_RELEASE/../logs/elephant/dr_elephant.log
Step 13: Once the application starts, you can open the UI at ip:port (localhost:8080)
Step 14: To stop dr-elephant run
$> bin/stop.sh
from https://github.com/linkedin/dr-elephant/wiki/Quick-Setup-Instructions-(Must-Read)
No comments:
Post a Comment