Professional Documents
Culture Documents
Movie cAiYBD4BQeE showed installation Movie Th5Scvlyt-E showed Nutch web crawl Extract data from Solr Extract to xml or csv Show aim to load into data warehouse
Set 'Core Selector' = collection1 Click 'Query' In Query window set fl field = url Click Execute Query
How To Extract
In admin console via query Via http solr select Via curl -o call using solr http select Xml Comma separated variable (csv)
How To Extract
tstamp, url
We want to extract as csv ( csv in call below could be xml ) We want to extract to a file So we will use an http call
http://localhost:8983/solr/select?q=*:*&fl=tstamp,url&wt=csv
How To Extract
./solr_url_extract.bash
Check Output
result.csv.20130506.124857
Check the content , wc -l shows 11 lines Check the content , head -2 shows
Congratulations, you have extracted data from Solr It's in CSV format ready to be loaded into a data warehouse
Choose more fields to extract from data Allow Nutch crawl to go deeper Allow Nutch crawl to collect a lot more data Look at facets in Solr data Load CSV files into Data Warehouse Staging schema Next movie will show next step in progress
Contact Us
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems