Total Pageviews

Sunday, 26 June 2016

一个基于go的爬虫程序- go_spider

An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
Build Status
A crawler of vertical communities achieved by GOLANG.
Latest stable Release: Version 1.2 (Sep 23, 2014).
  • go_spider讨论群 QQ群号:337344607

Features

  • Concurrent
  • Fit for vertical communities
  • Flexible, Modular
  • Native Go implementation
  • Can be expanded to an individualized crawler easily

Requirements

  • Go 1.2 or higher

Documentation

Installation

go get github.com/hu17889/go_spider
go get github.com/PuerkitoBio/goquery
go get github.com/bitly/go-simplejson
go get golang.org/x/net/html/charset
This project is based on simplejsongoquery.
You can download packages from http://gopm.io/ in China.

Use example

Here is an example for crawling github content. You can have a try of the crawl process.
  • go install github.com/hu17889/go_spider/example/github_repo_page_processor
  • ./bin/github_repo_page_processor
More examples here: examples.
from https://github.com/hu17889/go_spider