Readme in Chinese
User Manual (Chinese)
A scalable crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler.
Features:
- Simple core with high flexibility.
- Simple API for html extracting.
- Annotation with POJO to customize a crawler, no configuration.
- Multi-thread and Distribution support.
- Easy to be integrated.
Install:
Add dependencies to your pom.xml:<dependency>
<groupId>us.codecraft</groupId>
<artifactId>webmagic-core</artifactId>
<version>0.5.3</version>
</dependency>
<dependency>
<groupId>us.codecraft</groupId>
<artifactId>webmagic-extension</artifactId>
<version>0.5.3</version>
</dependency>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
from https://github.com/code4craft/webmagic
https://github.com/webmagic-io/docs
https://github.com/webmagic-io/webmagic-io.github.io