Solr is only for indexing and searching text, it does not have a crawler since it’s out the project’s scope. However take a look at Nutch, which is a crawler and not too hard to setup initially.
Nutch and Solr can be integrated if you need some Solr-specific feature to search the index.
Installing Solr
o) sudo apt-get install python-setuptools
o) Apache Solr web: download latest tgz sources (3.6.1)
o) cd to apache download directory
o) sudo apt-get install ant
o) sudo apt-get install ivy-bootstrap
o) JDK + JRE (6+) sudo apt-get install openjdk-6-jdk
o) sudo ant compile
o) sudo ant test
o) sudo ant example
o) cd to apache download directory
o) sudo apt-get install ant
o) sudo apt-get install ivy-bootstrap
o) JDK + JRE (6+) sudo apt-get install openjdk-6-jdk
o) sudo ant compile
o) sudo ant test
o) sudo ant example
*********
Bake your web search with Sunburnt, Solr
Search is an integral functionality of a web project. A high performance, scalable and robust search solution has been a perennial need of developers. Apache Solr is an open source, community driven solution for search implementation with REST APIs. Sunburnt is a Pythonic way to interface Solr.
Agenda The problem • Big Data • Scalability • Reliability • Performance What is Solr? • Index • Search What it is not Integrating Solr to your web project Using Sunburnt to query Aspirational Nach? Whoosh Haystack