Tuesday, March 9, 2010

Enterprise crawler

When we need to read data from storage.
  • crawleradmin --offline --getdata aa2: http://cmb1u123.cmb1.fast.no:30000/niran/sitemap/sitemap.xml
read data from meta storage.

crawlerdbtool -m viewraw -d data/store/aa2/data/6

ey (raw): 'http://cmb1u123.cmb1.fast.no:30000/niran/sitemap/c.html'

(0, 0, {1: 1268106394, 2: 0, 3: 'text/html', 4: 'ykA\xd1_\xaf;\xba@r\x94\xa2\xc8o\x8f~', 5: None, 6: ('aa2/data/6', 113539, 65539), 7: [('http://cmb1u123.cmb1.fast.no:30000/niran/sitemap/sitemap.xml', 1073741824)], 8: 'HTTP/1.1 200 OK\r\nDate: Tue, 09 Mar 2010 08:12:26 GMT\r\nServer: Apache/2.2.9 (Unix) PHP/5.2.6\r\nLast-Modified: Tue, 06 Oct 2009 14:11:21 GMT\r\nETag: "2bbf9-59780-47544cdfd5440"\r\nAccept-Ranges: bytes\r\nContent-Length: 366464\r\nConnection: close\r\nContent-Type: text/html', 9: 'deflate', 11: 'Tue, 06 Oct 2009 14:11:21 GMT', 12: '"2bbf9-59780-47544cdfd5440"', 13: 0, 14: None, 15: None, 16: (1, 1, 1), 17: 0, 18: [], 19: None, 20: 0, 21: None, 22: 0, 23: {'sm': {'priority': '0.8', 'lastmod': '2005-01-01', 'changefreq': 'monthly'}}}, 0)