A Powerful Spider(Web Crawler) System in Python.
- Write script in Python
- Powerful WebUI with script editor, task monitor, project manager and result viewer
- MySQL, MongoDB, Redis, SQLite, PostgreSQL with SQLAlchemy as database backend
- RabbitMQ, Beanstalk, Redis and Kombu as message queue
- Task priority, retry, periodical, recrawl by age, etc...
- Distributed architecture, Crawl Javascript pages, Python 2&3, etc...
Installation
pip install pyspider
- run command
pyspider
, visit http://localhost:5000/
Quickstart: http://docs.pyspider.org/en/latest/Quickstart/
from https://github.com/binux/pyspider