Stage 2: Download Statistics Update

    Introduction

    This feature allows for the processing of a log and it's upload to a MySQL database. Normally, a log file is formatted in a way that can not be directly inserted into a MySQL database. This feature, combined with a parser, will allow for log files to be transfered into a MySQL database.

    Purpose

    This feature will, when given the correct  parser, allow for the translation from a log file into its MySQL database representation which will allow for fast searching and indexing of the log data.

    Intended Audience

    Anybody who has log files that contain information that need to be sorted and analyzed.

    Status

    Testing.

    Use Cases

    A user will have a log file, parser for said log file and a MySQL database to store the parsed data.

    Functional Specification

    The program can be access via command line or it's API.

    CommandLine:

    DownloadStatisticsUpdate [LOGTYPE(APACHE|S3)] [LOGPATH]

    *TODO: cfg file for login info*

    Non-goals

    It does not parse the log files, it only reads it in. A parser handles the parsing. 

    Technical Specification

    Database

    TODO: on delete commands

     CREATE TABLE `downloads` (
        `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
        # IP Address of person who downloaded
        `ip` INT UNSIGNED NOT NULL ,
        # Date/time of download
        `date_time` DATETIME NOT NULL ,
        # id number of the row in product_versions that corresponds to the version of the product downloaded in this entry
        `version_id` INT UNSIGNED NOT NULL ,
        # id number of the batch this entry was uploaded with, keeps track of which log it comes from
        `batch_id` INT UNSIGNED NOT NULL,
        # Unparsed line of from the log
        `raw` TEXT NOT NULL ,
        INDEX ( `date_time` , `ip` ) ,
        INDEX ( `version_id` ) ,
        INDEX ( `batch_id`)
    ) ENGINE=INNODB;
    
    CREATE  TABLE  `versions` (
        `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
        #filename of file that created this entry
        `filename` VARCHAR( 128 ) NOT NULL ,
        #Version number of the downloaded file
        `version` DOUBLE NOT NULL ,
        #type of the download, enterprise or core
        `edition` ENUM('enterprise', 'core', 'desktop_suite') NOT NULL,
        #Intended platform of the download
        `platform` ENUM('vmware', 'msi', 'source', 'package') NOT NULL,
        # Target operating system of MindTouch
        `os` ENUM('windows', 'linux', 'other') NOT NULL ,
        # Operating system flavor
        `flavor` ENUM('vista', 'server 2008', 'server 2003', 'vmware', 'ubuntu', 'SLE', 'RHEL', 'debian', 'cent_os', 'fedora', 'opensuse', 'other') NOT NULL ,
        INDEX (`filename`) ,    
        UNIQUE (`filename`, `os`, `flavor`)
    ) ENGINE=INNODB;
    
    CREATE  TABLE  `batches` ( 
        `id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
        #date of upload 
        `upload_date_time` DATETIME NOT NULL ,
        #hash code that is generated from each set of log files entered, used for tracking which batch a log file came from
        `hash` VARCHAR( 32 ) NOT NULL ,
        #name of the log file, assuming it is properly named,and if not shame on the person who uploaded the log
        `source_file` VARCHAR ( 32 ) NOT NULL ,
        #server that the log files came from
        `source_server` ENUM ('apache', 's3') NOT NULL ,
        #is only switched to Y after the whole log has been uploaded, ensures that complete data set has been uploaded
        `finishedupload` ENUM('Y', 'N') NOT NULL , 
        UNIQUE (`hash`)
    ) ENGINE=INNODB;
    
    ALTER TABLE downloads
    ADD FOREIGN KEY (`version_id`) REFERENCES versions(id);
    
    ALTER TABLE downloads
    ADD FOREIGN KEY (`batch_id`) REFERENCES batches(id);
    
    
     

    Program:

     

    Wrapper functions are used to enable easier commandline usage. This can be seen below in the interface section. Aside from these wrappers which lead into the main program, serveral helper functions also exist:

    • GetMD5(string fileName)
      • Creates a md5 hash from a file.
    • InsertIntoDB()
      • Inserts a parsedEntry into the db.
    • GetVersionID()
      • Retrieves the version id corresponding to the parsedEntry's version
    • GetBatchID()
      • Retrieves the batch id number corresponding to the generated hash

     

    Main Process:

    • Open log file
    • Load version dictionary
    • Create hash md5 of the log file
    • Check to see if md5 hash exist in database
      • Successfully uploaded
        • Return
    • Loop( until log is done )
      • Get line from log
      • Parse line (send off to parser)
      • Get version_id number for entry
        • If does not exist, insert
        • Query for id number
      • Insert into db
    • Exit

    Interface:

    Apache:
    public static void UpdateWithApacheLogs(string logPath, string connectionString);
    
    S3:
    public static void UpdateWithS3Logs(string logPath, string connectionString);
    
    Generic:
    public static void Update(Parser parser, string logPath, string connectionString);
    

    API also supports calls to the Update function directly so that other types of parsers that are not included can be used.

    API requirements

    Currently, only one log type is built in and can be called.

    public static void updateWithApacheLogs(string logPath, string connectionString);
    

    The Updater can also be called with a custom Parser:

    public static void Update(Parser parser, string logPath, string conn);
    Tag page
    You must login to post a comment.

    Copyright © 2011 MindTouch, Inc. Powered by