per-instance-indexing

    Table of contents
    1. 1. Problems
    2. 2. Solution

    Problems

    • Lucene assumes same apikey as api server, requiring all api servers to share one api key
    • Lucene subscribes to deki services, not instances, which means one cluster of api servers can only ever use one lucene indexer
    • lucene only knows how to subscribe to api servers by being told and api servers only tell lucene when they start up
    • It isn't clear that api server startup subscription is deterministic and not just accidental

    Solution

    In single-tentant all is fine, since lucene and single tenant are hardwired to each other. The complication only exists in multi-tenant.

    • Instead of telling lucene at deki start-up, each instance tells lucene about its existence and periodically tells it again
      • Lucene will subscribe to the pubsub service told by the instance and if told again, just reconfirm that it has a subscription
      • This way if lucene is restarted, it will re-subscribe at the next heartbeat
    • It should be possible for the hearbeat to be explicitly configured what IP lucene should subscribe to, so that we can guarantee that it's the api servers ip, not the load balanced cname
    • The per instance subscription will communicate an auth-cookie to talk to the instance that is used by the indexer instead of the apikey
    • The lucene server for the instance will optionally come from the instance settings
    • Pubsub will allow for persistent subscriptions
      • A subscriber can put a subscription at a location of its choosing, so it can find the subscription again
      • A subscriber can tell pubsub the auth key for that location, so it can access the subscription again
      • The message queue for that location will be stored in a persistent queue
      • The subscription will have a TTL, if messages cannot be delivered within that TTL, the subscription will be dropped, so that queues don't build up infinitely

    This should address all scenarios:

    • Instances can use lucene servers regardless of the api server they are on
    • Instances on multiple apiservers can get their lucene server to subscribe to all
    • If the api server is bounced or a new one is added, subscriptions will be set up
    • If an instance moves deployments, subscriptions will be set up and old ones will expire
    • If a lucene server goes down it will continue to receive messages because its subscriptions are persistent and will still re-subscribe any dropped ones via hearbeat
    Tag page
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by