Total Pageviews

Monday, 13 June 2016

webmagic -一个基于java的爬虫框架

A scalable web crawler framework for Java.

Readme in Chinese
User Manual (Chinese)
Build Status
A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler.


  • Simple core with high flexibility.
  • Simple API for html extracting.
  • Annotation with POJO to customize a crawler, no configuration.
  • Multi-thread and Distribution support.
  • Easy to be integrated.


Add dependencies to your pom.xml:
WebMagic use slf4j with slf4j-log4j12 implementation. If you customized your slf4j implementation, please exclude slf4j-log4j12.