An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.
A crawler of vertical communities achieved by GOLANG.
Latest stable Release: Version 1.2 (Sep 23, 2014).
Features
- Concurrent
- Fit for vertical communities
- Flexible, Modular
- Native Go implementation
- Can be expanded to an individualized crawler easily
Requirements
- Go 1.2 or higher
Documentation
Installation
go get github.com/hu17889/go_spider
go get github.com/PuerkitoBio/goquery
go get github.com/bitly/go-simplejson
go get golang.org/x/net/html/charset
This project is based on simplejson, goquery.
You can download packages from http://gopm.io/ in China.
Use example
Here is an example for crawling github content. You can have a try of the crawl process.
go install github.com/hu17889/go_spider/example/github_repo_page_processor
./bin/github_repo_page_processor
More examples here: examples.
from https://github.com/hu17889/go_spider