Table of contents
    1. 1. Status
    2. 2. Current versioning
    3. 3. Old by rev # Looking by an old page rev by revision# is done by Old_GetOlds which is ordered by old_id asc and has a subquery to calculate the revision of each page by doing: SELECT COUNT(*) FROM old old2 WHERE old_namespace = ?NAMESPACE AND old_title = ?TITLE  AND old2.old_id <= old.old_id. We're making basic assumption here that revisions are entered into the old table in chronological order with old_id being a reliable means of revisioning. This same concept is used in the old table revision update query.   Old -> Archive Upon page delete, old revs are moved to the archive followed by the head page. This is done by Archive_MovePagesTo without any sorting of the olds first and thus not guaranteeing that archive's primary key ar_id can be used to tell the revision order reliably.     Archive -> Old Restoring a page relies on Archive_GetRevisionsByPageIds which returns all revisions for 1 or more pages and it's ordered by "ar_last_page_id desc, ar_timestamp asc". This is slightly ambiguous since revision timestamps are not guaranteed to be unique. Within archive we really have no guarantee which revision came before others for pages with the same timestamp (ie, created by unit tests). This is why the archive revision update query below has this logic: and ((b.ar_timestamp < a.ar_timestamp) OR (b.ar_timestamp = a.ar_timestamp AND b.ar_id < a.ar_id)) in order to try to revision by timestamp but when thats not helpful then revert to the ar_id. This makes a best effort to ensure every revision is unique and in the correct order.   Queries to remove: Pages_GetOldRevisionsCount: just use the revision number on page instead   Queries to modify: Old_GetOldByTimestamp: calculates revision number for each old returned Old_GetOlds: calculates revision number for each old returned Old_Insert: calculates revision number Old_GetOldsIncludingHeadRev: sums rows to calculate revision number   Index changes: This index will be added: alter table `old` add unique `ns_title_rev` (`old_namespace`, `old_title`, `old_revision`); While we're at it
    4. 4. Upgrade SQL script

    This is description of the work involved in adding revision columns for pages rather than the current approach of computing them.

    Tracking bug

    Status

    Initial research complete. API changes should be made before upgrade script is pushed out.

    Current versioning

    Old by rev #
    Looking by an old page rev by revision# is done by Old_GetOlds which is ordered by old_id asc and has a subquery to calculate the revision of each page by doing: SELECT COUNT(*) FROM old old2 WHERE old_namespace = ?NAMESPACE AND old_title = ?TITLE  AND old2.old_id <= old.old_id.
    We're making basic assumption here that revisions are entered into the old table in chronological order with old_id being a reliable means of revisioning. This same concept is used in the old table revision update query.
     
    Old -> Archive
    Upon page delete, old revs are moved to the archive followed by the head page. This is done by Archive_MovePagesTo without any sorting of the olds first and thus not guaranteeing that archive's primary key ar_id can be used to tell the revision order reliably.  
     
    Archive -> Old
    Restoring a page relies on Archive_GetRevisionsByPageIds which returns all revisions for 1 or more pages and it's ordered by "ar_last_page_id desc, ar_timestamp asc". This is slightly ambiguous since revision timestamps are not guaranteed to be unique. Within archive we really have no guarantee which revision came before others for pages with the same timestamp (ie, created by unit tests). This is why the archive revision update query below has this logic:
    and ((b.ar_timestamp < a.ar_timestamp) OR (b.ar_timestamp = a.ar_timestamp AND b.ar_id < a.ar_id))
    in order to try to revision by timestamp but when thats not helpful then revert to the ar_id. This makes a best effort to ensure every revision is unique and in the correct order.
     
    Queries to remove:
    Pages_GetOldRevisionsCount: just use the revision number on page instead
     
    Queries to modify:
    Old_GetOldByTimestamp: calculates revision number for each old returned
    Old_GetOlds: calculates revision number for each old returned
    Old_Insert: calculates revision number
    Old_GetOldsIncludingHeadRev: sums rows to calculate revision number
     
    Index changes:
    This index will be added:
    alter table `old` add unique `ns_title_rev` (`old_namespace`, `old_title`, `old_revision`);

    While we're at it

    Page id in old? 
    While making these changes it may be a good time to add a page id column to the old table as well. Every select on the old table operates by title+namespace which would be faster if replaced by an indexed integer lookup. The unique title/namespace/rev index above would be changed to old_page_id/old_revision.
     

    Upgrade SQL script



    alter table `pages` add column `page_revision` int UNSIGNED NOT NULL;

    update `pages` p
    set p.page_revision =
    (    SELECT     count(*) + 1
        FROM     `old` o
        WHERE     o.old_title = p.page_title
        AND    o.old_namespace = p.page_namespace
    );

    alter table `old` add column `old_revision` int UNSIGNED NOT NULL;
    alter table `old` add unique `ns_title_rev` (`old_namespace`, `old_title`, `old_revision`);

    update old c
    join (
        select a.old_id, a.old_title, a.old_namespace, a.old_timestamp,
        (    select count(*)
            from old b
            where b.old_title = a.old_title
            and b.old_namespace = a.old_namespace       
            and b.old_id <= a.old_id
        ) as rev
    from old a) d
    on c.old_id = d.old_id
    set c.old_revision = d.rev;

    alter table `archive` add column `ar_revision` int UNSIGNED NOT NULL;

    update archive c
    join (   
        select a.ar_id, a.ar_title, a.ar_namespace, a.ar_timestamp, a.ar_last_page_id,
        (    select count(*) + 1
            from archive b
            where b.ar_title = a.ar_title
            and b.ar_namespace = a.ar_namespace
            and b.ar_last_page_id = a.ar_last_page_id
            and ((b.ar_timestamp < a.ar_timestamp) OR (b.ar_timestamp = a.ar_timestamp AND b.ar_id < a.ar_id))
        ) as rev   
    from archive a) d
    on c.ar_id = d.ar_id
    set c.ar_revision = d.rev;

    Tag page
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by