Total Pageviews

Tuesday 13 September 2016

使用Apache Zookeeper分布式部署PHP程序

原文:Distributed application in PHP with Apache Zookeeper
地址:http://systemsarchitect.net/distributed-application-in-php-with-apache-zookeeper/
这篇文章实在不错,实在忍不住翻译下来,希望对大家有用。
Apache Zookeeper是我最近遇到的最酷的技术,我是在研究Solr Cloud功能的时候发现的。Solr的分布式计算让我印象深刻。你只要开启一个新的实例就能自动在Solr Cloud中找到。它会将自己分派到某个分片中,并确定出自己是一个Leader(源)还是一个副本。不一会儿,你就可以在你的那些服务器上查询到了。即便某些服务器宕机了也可以继续工作。非常动态、聪明、酷。
将运行多个应用程序作为一个逻辑程序并不是什么新玩意。事实上,我在几年前就已写过类似的软件。这种架构比较让人迷惑,使用起来也费劲。为此Apache Zookeeper提供了一套工具用于管理这种软件。
为什么叫Zoo?“因为要协调的分布式系统是一个动物园”。
在本篇文章中,我将说明如何使用PHP安装和集成Apache ZooKeeper。我们将通过service来协调各个独立的PHP脚本,并让它们同意某个成为Leader(所以称作Leader选举)。当Leader退出(或崩溃)时,worker可检测到并再选出新的leader。
ZooKeeper是一个中性化的Service,用于管理配置信息、命名、提供分布式同步,还能组合Service。所有这些种类的Service都会在分布式应用程序中使用到。每次编写这些Service都会涉及大量的修bug和竞争情况。正因为这种编写这些Service有一定难度,所以通常都会忽视它们,这就使得在应用程序有变化时变得难以管理应用程序。即使处理得当,实现这些服务的不同方法也会使得部署应用程序变得难以管理。
虽然ZooKeeper是一个Java应用程序,但C也可以使用。这里就有个PHP的扩展,由Andrei Zmievski在2009创建并维护。你可以从PECL中下载,或从GitHub中直接获取PHP-ZooKeeper
要使用该扩展你首先要安装ZooKeeper。可以从官方网站下载。
$ tar zxfv zookeeper-3.4.5.tar.gz
$ cd zookeeper-3.4.5/src/c
$ ./configure --prefix=/usr/
$ make
$ sudo make install
这样就会安装ZooKeeper的库和头文件。现在准备编译PHP扩展。

$ cd
$ git clone https://github.com/andreiz/php-zookeeper.git
$ cd php-zookeeper
$ phpize
$ ./configure
$ make
$ sudo make install
将“zookeeper.so”添加到PHP配置中。

$ vim /etc/php5/cli/conf.d/20-zookeeper.ini
因为我不需要运行在web服务环境下,所以这里我只编辑了CLI的配置。将下面的行复制到ini文件中。

extension=zookeeper.so
使用如下命令来确定扩展是否已起作用。

$ php -m | grep zookeeper
zookeeper
现在是时候运行ZooKeeper了。目前唯一还没有做的是配置。创建一个用于存放所有service数据的目录。

$ mkdir /home/you-account/zoo
$ cd
$ cd zookeeper-3.4.5/
$ cp conf/zoo_sample.cfg conf/zoo.cfg
$ vim conf/zoo.cfg
找到名为“dataDir”的属性,将其指向“/home/you-account/zoo”目录。

$ bin/zkServer.sh start
$ bin/zkCli.sh -server 127.0.0.1:2181
[zk: 127.0.0.1:2181(CONNECTED) 14] create /test 1
Created /test
[zk: 127.0.0.1:2181(CONNECTED) 19] ls /
[test, zookeeper]
此时,你已成功连到了ZooKeeper,并创建了一个名为“/test”的znode(稍后我们会用到)。ZooKeeper以树形结构保存数据。这很类似于文件系统,但“文件夹”(译者注:这里指非最底层的节点)又和文件很像。znode是ZooKeeper保存的实体。Node(节点)的说法很容易被混淆,所以为了避免混淆这里使用了znode。

因为我们稍后还会使用,所以这里我们让客户端保持连接状态。开启一个新窗口,并创建一个zookeeperdemo1.php文件。

<?php
class ZookeeperDemo extends Zookeeper {

  public function watcher( $i, $type, $key ) {
    echo "Insider Watcher\n";

    // Watcher gets consumed so we need to set a new one
    $this->get( '/test', array($this, 'watcher' ) );
  }

}

$zoo = new ZookeeperDemo('127.0.0.1:2181');
$zoo->get( '/test', array($zoo, 'watcher' ) );

while( true ) {
  echo '.';
  sleep(2);
}
现在运行该脚本。

$ php zookeeperdemo1.php
此处应该会每隔2秒产生一个点。现在切换到ZooKeeper客户端,并更新“/test”值。

[zk: 127.0.0.1:2181(CONNECTED) 20] set /test foo
这样就会静默触发PHP脚本中的“Insider Watcher”消息。怎么会这样的?

ZooKeeper提供了可以绑定在znode的监视器。如果监视器发现znode发生变化,该service会立即通知所有相关的客户端。这就是PHP脚本如何知道变化的。Zookeeper::get方法的第二个参数是回调函数。当触发事件时,监视器会被消费掉,所以我们需要在回调函数中再次设置监视器。

现在你可以准备创建分布式应用程序了。其中的挑战是让这些独立的程序决定哪个(是leader)协调它们的工作,以及哪些(是worker)需要执行。这个处理过程叫做leader选举,
ZooKeeper Recipes and Solutions你能看到相关的实现方法。
这里简单来说就是,每个处理(或服务器)紧盯着相邻的那个处理(或服务器)。如果一个已被监视的处理(也即Leader)退出或者崩溃了,监视程序就会查找其相邻(此时最老)的那个处理作为Leader。
在真实的应用程序中,leader会给worker分配任务、监控进程和保存结果。这里为了简化,我跳过了这些部分。
ZooKeeper是一个强大的软件,拥有简洁和简单的API。由于文档和示例都做的很好,任何人都可以很容易的编写分布式软件。让我们开始吧,这会很有趣的。
相关连接:
Yahoo research on ZooKeeper. 非常好的阅读材料,拥有真实的应用程序示例。如果你只要阅读一份ZooKeeper的资料,那么就是这份了。

PHP ZooKeeper

PHP ZooKeeper API

PHP ZooKeeper example
---------------------

Apache ZooKeeper is the coolest technology I recently came across. I found it when I was doing a research about Solr Cloud features. I got very impressed by Solr’s distributed computing. You literately have to fire a new instance and it will automatically find its place in “the cloud”. It will assign itself to a particular shards and it will make a decision to become a leader or a replica. Later you can query any of the available servers and it will find you all required data even if it’s not on that server. If some of the servers fail the service will continue to work. Very dynamic, very clever, very cool.

Running multiple application as one logical program is nothing new. In fact creating such a software was one of my first jobs many years ago. This type of architecture is confusing and very tricky to work with. Apache ZooKeeper tries to provide a generic set of tools to manage such a software. 
Why Zoo? “Because Coordinating Distributed Systems is a Zoo”.
In this post I’m going show how to install and integrate Apache ZooKeeper with PHP. We will use the service to coordinate independent PHP scripts and let them agree on which one is going to be the leader (so called leader election). When the leader exit (or crash) workers should detect it and elect a new one.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
ZooKeeper is a Java application but it also comes with C bindings. There is a PHP extension created and maintained by Andrei Zmievski since 2009. You can download it from PECL or get directly from GitHub PHP-ZooKeeper
To get started with the extension you need to install ZooKeeper. Download it from the official site
$ tar zxfv zookeeper-3.4.5.tar.gz
$ cd zookeeper-3.4.5/src/c
$ ./configure --prefix=/usr/
$ make
$ sudo make install
That will install ZooKeeper’s library and headers. Now you are ready to compile the PHP extension.
$ cd
$ git clone https://github.com/andreiz/php-zookeeper.git
$ cd php-zookeeper
$ phpize
$ ./configure
$ make
$ sudo make install
Add “zookeeper.so” to PHP configuration.
$ vim /etc/php5/cli/conf.d/20-zookeeper.ini
I edit only CLI config because I won’t need it in a web server context. Paste the below line into the ini file.
extension=zookeeper.so
Make sure the extension is working. 
$ php -m | grep zookeeper
zookeeper
It’s a good time to run ZooKeeper. The only missing thing is configuration. Create a directory for the service where it can keep all its data. 
$ mkdir /home/you-account/zoo
$ cd
$ cd zookeeper-3.4.5/
$ cp conf/zoo_sample.cfg conf/zoo.cfg
$ vim conf/zoo.cfg
Find attribute called “dataDir” and point it to your “/home/you-account/zoo” directory.
$ bin/zkServer.sh start
$ bin/zkCli.sh -server 127.0.0.1:2181
[zk: 127.0.0.1:2181(CONNECTED) 14] create /test 1
Created /test
[zk: 127.0.0.1:2181(CONNECTED) 19] ls /
[test, zookeeper]
That will connect you to the service and create a “/test” znode (we will use it in a second). ZooKeeper stores data in a tree structure. It’s very similar to a file system with a difference that “directories” can simultaneously behave like files. Every entity stored by ZooKeeper is called znode. Node in an ambiguous word in this context so to avoid confusion the system is using different name.
Leave the client connected because we will use it in a second. Open a new window and create a zookeeperdemo1.php file.
get( '/test', array($this, 'watcher' ) );
  }

}

$zoo = new ZookeeperDemo('127.0.0.1:2181');
$zoo->get( '/test', array($zoo, 'watcher' ) );

while( true ) {
  echo '.';
  sleep(2);
}
Now run the script.
$ php zookeeperdemo1.php
It should produce a dot every 2 seconds. Now switch to ZooKeeper client and update “/test” value.
[zk: 127.0.0.1:2181(CONNECTED) 20] set /test foo
That should immaterially trigger “Insider Watcher” message in the PHP script. How did that happen? 
ZooKeeper provides watchers which can be attached to znodes. If watched znode change the service will instantly inform all interested clients about it. This is how the PHP script knew about the change. Second parameter of Zookeeper::get method is callback. Watcher gets consumed when event is triggered so we need to set it again in the callback.
Now you are ready to create a distributed application. The challenge is to let independent programs decide which one should be coordinating them (the leader) and which should be doing the job (workers). The process is called leader election and you can read about implementation at ZooKeeper Recipes and Solutions.
In a nutshell each process looks at a process next to it. If a watched process exit or crash the watching program should check is it the oldest process. If it is it will become the leader. 
In real life application the leader should be allocating tasks to workers, monitor progress and store results. I will skip this part for the sake of simplicity.
Create a new PHP file and call it worker.php.
 Zookeeper::PERM_ALL,
                      'scheme' => 'world',
                      'id' => 'anyone' ) );

  private $isLeader = false;

  private $znode;

  public function __construct( $host = '', $watcher_cb = null, $recv_timeout = 10000 ) {
    parent::__construct( $host, $watcher_cb, $recv_timeout );
  }

  public function register() {
    if( ! $this->exists( self::CONTAINER ) ) {
      $this->create( self::CONTAINER, null, $this->acl );
    }

    $this->znode = $this->create( self::CONTAINER . '/w-', 
                                  null, 
                                  $this->acl, 
                                  Zookeeper::EPHEMERAL | Zookeeper::SEQUENCE );

    $this->znode = str_replace( self::CONTAINER .'/', '', $this->znode );

    printf( "I'm registred as: %sn", $this->znode );

    $watching = $this->watchPrevious();

    if( $watching == $this->znode ) {
      printf( "Nobody here, I'm the leadern" );
      $this->setLeader( true );
    }
    else {
      printf( "I'm watching %sn", $watching );
    }
  }

  public function watchPrevious() {
    $workers = $this->getChildren( self::CONTAINER );
    sort( $workers );
    $size = sizeof( $workers );
    for( $i = 0 ; $i znode == $workers[ $i ] ) {
        if( $i > 0 ) {
          $this->get( self::CONTAINER . '/' . $workers[ $i - 1 ], array( $this, 'watchNode' ) );
          return $workers[ $i - 1 ];
        }

        return $workers[ $i ];
      }
    }

    throw new Exception(  sprintf( "Something went very wrong! I can't find myself: %s/%s", 
                          self::CONTAINER, 
                          $this->znode ) );
  }

  public function watchNode( $i, $type, $name ) {
    $watching = $this->watchPrevious();
    if( $watching == $this->znode ) {
      printf( "I'm the new leader!n" );
      $this->setLeader( true );
    }
    else {
      printf( "Now I'm watching %sn", $watching );
    }
  }

  public function isLeader() {
    return $this->isLeader;
  }

  public function setLeader($flag) {
    $this->isLeader = $flag;
  }

  public function run() {
    $this->register();

    while( true ) {
      if( $this->isLeader() ) {
        $this->doLeaderJob();
    }
    else {
      $this->doWorkerJob();
    }

      sleep( 2 );
    }
  }

  public function doLeaderJob() {
    echo "Leadingn";
  }

  public function doWorkerJob() {
    echo "Workingn";
  }

}

$worker = new Worker( '127.0.0.1:2181' );
$worker->run();
Open at least 3 terminals and run the script in each of them.
# term1

$ php worker.php 
I'm registred as: w-0000000001
Nobody here, I'm the leader
Leading

# term2

$ php worker.php 
I'm registred as: w-0000000002
I'm watching w-0000000001
Working

# term3

$ php worker.php 
I'm registred as: w-0000000003
I'm watching w-0000000002
Working
Now simulate crash of the leader. Exit first script with Ctrl+c or any other method. Nothing will change for few seconds. Workers will happily continue working. Eventually ZooKeeper will discover timeout and new leader is going to be elected.
It’s easy to understand the script but it might be worth to comment on used Zookeeper flags.
$this->znode = $this->create( self::CONTAINER . '/w-', 
                              null, 
                              $this->acl, 
                              Zookeeper::EPHEMERAL | Zookeeper::SEQUENCE );
Every znode is as EPHEMERAL and SEQUENCE
EPHEMERAL means that znode will be removed when client disconnect. This is how the PHP script knew about timeout. SEQUENCE means that a sequence string is going to be append to every znode name. We used them as unique identifiers for workers.
Be aware there are some problems on PHP side. The extension is in beta version and If you not follow certain patterns it’s quite easy to get segmentation faults. For example, I wasn’t able to pass an ordinary function as a callback. It has to be a method. The good news is that if something is working it should remain in that state. I hope more people from PHP community will get excited about Apache ZooKeeper and the extension will receive more support.
ZooKeeper is great software with clean and simple API. Thanks to quality documentation, examples and recipes anybody can start writing distributed software. Give it a go, it’s fun!
Useful links:
– Yahoo research on ZooKeeper. Very good read with examples of real life applications. If you had to read only one thing about the service that would be it.
– PHP ZooKeeper
– PHP ZooKeeper API
– PHP ZooKeeper example
from https://systemsarchitect.net/2013/03/31/distributed-application-in-php-with-apache-zookeeper/