Was this page helpful?

Configuring robots.txt to reduce server load

    Configure robots.txt to reduce server load from the hits of, in this example, Google Search Appliance (GSA)

    Google Search Appliance (GSA)

    The GSA is quite the greedy crawler and will crawl every userpage, feed, revision and special page it can get it hands on. In our case this led to an increased server load, sometimes bordering on the abnorm. You can of course configure GSA in the graphical front end but it might also be a good idea to configure the robots.txt

    Robots.txt for GSA

    Place the robots.txt in your /var/www/dekiwiki directory
    Please note that the use of the asterix and Crawl-delay isn't supported by all user agents
    The folder "Arbetsmaterial" is just a folder we use for temporary storage, hence I don't even want the GSA to try to crawl it. We also use Wordpress so I don't want it to crawl wp-includes and wp-content

    # /robots.txt example for mindtouch with google search appliance
    
    User-agent: *
    Crawl-delay: 240
    Disallow: /phpmailer/
    Disallow: /deki/cp/
    Disallow: /skins/
    Disallow: /*Template:*
    Disallow: /*Admin:*
    Disallow: /*User:*
    Disallow: /*Special:*
    Disallow: /*revision=*
    Disallow: /*Talk:*
    Disallow: /*.js$
    Disallow: /*.bmp$
    Disallow: /*.css$
    
    Was this page helpful?
    Tag page
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by