Pages

Sunday, 16 June 2024

dr-elephant

 Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark。

Dr. Elephant is a performance monitoring and tuning tool for Hadoop and Spark. It automatically gathers all the metrics, runs analysis on them, and presents them in a simple way for easy consumption. Its goal is to improve developer productivity and increase cluster efficiency by making it easier to tune the jobs. It analyzes the Hadoop and Spark jobs using a set of pluggable, configurable, rule-based heuristics that provide insights on how a job performed, and then uses the results to make suggestions about how to tune the job to make it perform more efficiently.

Documentation

For more information on Dr. Elephant, check the wiki pages here.

For quick setup instructions: Click here

Developer guide: Click here

Administrator guide: Click here

User guide: Click here

Engineering Blog: Click here

from https://github.com/linkedin/dr-elephant

----------

Quick Setup Instructions (Must Read)

Step 1: Create an account on github and fork the Dr. Elephant project.

Step 2: Checkout the code.

$> git clone https://github.com/<username>/dr-elephant
$> cd dr-elephant*

Step 3: Prerequisites:

  1. You must have play or activator command installed. Download the activator zip from https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip, unzip it and add the activator command to your $PATH. For older versions of Play, you need to add the play command instead of activator.
export ACTIVATOR_HOME=/path/to/unzipped/activator
export PATH=$ACTIVATOR_HOME/bin:$PATH
  1. Dr. Elephant stores the analyzed results in a MySQL database. Please install and setup mysql if you do not have it yet. (Recommend version 5.5+)
  2. (Optional, but recommended) In order to use the new Dr. Elephant UI, you need to install npm and dependencies
sudo yum install npm
sudo npm install -g bower
cd web; bower install; cd ..
  1. Lastly, you should have Hadoop and/or spark already setup.

Step 4: (Optional, Beta Phase) Please follow the below steps if you wish to try out the auto-tuning feature. (More details: https://github.com/linkedin/dr-elephant/wiki/Auto-Tuning)

  • Enable it by setting the value of property autotuning.enabled to true in app-conf/AutoTuningConf.xml
  • Install python with version 2.6+
  • If you want to use a python installation other than the one set in environment:
    • Either set PYTHON_PATH to the path of desired python executable: $> export PYTHON_PATH=/path/to/python/executable
    • Or, uncomment and set the value of optional property python.path to the path of desired python executable in app-conf/AutoTuningConf.xml
  • Install inspyred package by executing: sudo pip install inspyred
  • If pip is missing, it can be installed from https://pip.pypa.io/en/stable/installing/

Step 5: Compile Dr. Elephant code and generate the zip. Compile.sh script optionally takes a configuration file which includes the version of Hadoop and Spark to compile with. For instructions check the Developer Guide.

$> ./compile.sh [./compile.conf]

After compiling, the distribution is created under dist directory.

$> ls dist
dr-elephant*.zip

Step 6: Copy the distribution file to the machine where you want to deploy Dr. Elephant.

Step 7: On the machine where you want to deploy Dr. Elephant, make sure the below env variables are set.

$> export HADOOP_HOME=/path/to/hadoop/home
$> export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
$> export SPARK_HOME=/path/to/spark/home
$> export SPARK_CONF_DIR=/path/to/conf

Add hadoop to the system path because Dr. Elephant uses 'hadoop classpath' to load the right classes.

$> export PATH=$HADOOP_HOME/bin:$PATH

Step 8: You also need a backend to save the data. Configure the mysql database in the elephant.conf file.

# Database configuration
db_url=localhost
db_name=drelephant
db_user=root
db_password=""

Step 9:(Optional) If you want to have SSL enabled Dr.Elephant then add these confs in elephant.conf

# SSL related configuration
https_port=8090(any port you can configure)
https_keystore_location="/path/to/keystore"
https_keystore_type=TYPE_OF_KEYSTORE(for instance JKS)
https_keystore_password="password_for_keystore"

Step 10: If your cluster is kerberised, then update the keytab user and the keytab file location in the elephant.conf file.

Step 11: If you are running Dr. Elephant for the first time, you need to enable evolutions. To do so append(or uncomment jvm_props) -Devolutionplugin=enabled and -DapplyEvolutions.default=true to jvm_props in elephant.conf file. This will automatically create the mysql tables for you. Remember to disable the evolutions when you restart Dr. Elephant the next time.

$> vim ./app-conf/elephant.conf
jvm_props=" -Devolutionplugin=enabled -DapplyEvolutions.default=true"

Step 12: To start dr-elephant, run the start script specifying a path to the application's configuration files.

$> /bin/start.sh /path/to/app-conf/directory

To verify if Dr. Elephant started correctly, check the dr.log file.

$> less $DR_RELEASE/dr.log
...
play - database [default] connected at jdbc:mysql://localhost/drelephant?characterEncoding=UTF-8
application - Starting Application...
play - Application started (Prod)
play - Listening for HTTP on /0:0:0:0:0:0:0:0:8080

To verify if Dr. Elephant is analyzing jobs correctly correctly check the dr-elephant.log file.

$> less $DR_RELEASE/../logs/elephant/dr_elephant.log

Step 13: Once the application starts, you can open the UI at ip:port (localhost:8080)

Step 14: To stop dr-elephant run

$> bin/stop.sh 
from https://github.com/linkedin/dr-elephant/wiki/Quick-Setup-Instructions-(Must-Read) 

 

 

No comments:

Post a Comment