Configure robots.txt to reduce server load from the hits of, in this example, Google Search Appliance (GSA)
The GSA is quite the greedy crawler and will crawl every userpage, feed, revision and special page it can get it hands on. In our case this led to an increased server load, sometimes bordering on the abnorm. You can of course configure GSA in the graphical front end but it might also be a good idea to configure the robots.txt
Place the robots.txt in your /var/www/dekiwiki directory
Please note that the use of the asterix and Crawl-delay isn't supported by all user agents
The folder "Arbetsmaterial" is just a folder we use for temporary storage, hence I don't even want the GSA to try to crawl it. We also use Wordpress so I don't want it to crawl wp-includes and wp-content
# /robots.txt example for mindtouch with google search appliance User-agent: * Crawl-delay: 240 Disallow: /phpmailer/ Disallow: /deki/cp/ Disallow: /skins/ Disallow: /*Template:* Disallow: /*Admin:* Disallow: /*User:* Disallow: /*Special:* Disallow: /*revision=* Disallow: /*Talk:* Disallow: /*.js$ Disallow: /*.bmp$ Disallow: /*.css$
| Images 0 | ||
|---|---|---|
| No images to display in the gallery. |
Copyright © 2011 MindTouch, Inc. Powered by