The goal of this stage is to get data from their various sources and translate them into a database friendly format that can then be either directly written to the database via stage 2.
The purpose of this stage is solely to take in a line from a log and produce a consistent output that can be processed by Stage 2 and uploaded to a MySQL database.
Anybody who needs to convert log information into a database.
Requires different versions depending on what types of logs need to be parsed.
Planning.
Sources
Is not interacted with by the user. Is an object which is passed to an update function inside of stage two where it is used to parse logs.
Does not interact with anything but the line of the log that is passed to it.
TODO: upgrade date parsing to use datetime.parseexact();
TODO: use regex to do some of the parsing
All parsers inherit from the class Parser. Each parser will need a dictionary to ensure product versions match up.
{
public interface Parser
{
ParsedEntry parseEntry(string entry);
}
}
Which contains a single required function that is called be Stage 2 to parse the line of log. This function will return:
public class ParsedEntry// date time has to be stored in: %d-%M-%Y %H:%i:%s
{
// all private strings have getters and setters
private string _ip;private string _datetime;
private string _os;
//"Windows", "Linux", "Other"
//SRC and VM are considered "Other"
private string _flavor;
// "vista", "server 2003", "server 2008", "ubuntu", "debian", "suse", "centos", "rhel", "fedora", "other"
private string _platform;
// "vmware", "msi", "package", "source"
private string _version_edition;
// core, enterprise, desktop_suite
private string _version_number;
public ParsedEntry();
public override string ToString()
}
This ensures that all parsers will return a consistent object.
The purpose of this class is to help with the non uniform naming scheme. While most files can be parsed in an efficent manner, there exist some that do not fall under convention. For this reason, the parsing of the file version information is done in two stages. This version dictionary is part of the first stage. It will basically act as an override and pull version information from the pre written table instead of parsing.
This is a helper class that will query the database table `versions` for the correct id corresponding to the given information.
The class will support checking for the existance of a filename in the dictionary and the retrieval of the corresponding data to a filename.
public bool HasEntry(string filePath); public ParsedEntry GetEntry(string filePath); public ParsedEntry GetEntry(string filePath, string os, string flavor);
*version information is sent through versionDictionary first*
The apache log style that will be parsed:
75.72.174.217, -, -, [15/Apr/2008:13:16:31 -0500], "GET /CentOS_5/noarch/dekiwiki-1.9.0.8743-10.1.noarch.rpm HTTP/1.1", 200, 9759429, /CentOS_5/noarch/dekiwiki-1.9.0.8743-10.1.noarch.rpm
IP DateTime Command http code file size file downloaded
Parsing starts by deliminating the string into components via Split():
ip: Does no require parsing
datetime: The datetime is processed by feeding it into a c# datetime object. It is then time zone shifted before using datetime's ToString() method to output.
command: Currently not used for anything.
http access code: Used as a check to see if request is a download. All codes other 200 are ignored.
file size: Used to determine if there was a download. If no file size is present, entry is ignored.
file path: Used to find the OS of the downloader along with the file that they downloaded.
*version information is sent through versionDictionary first*
Logs are stored on server in a bucket. Entries are grouped by IP address and stored in separate log files that are named with date, time and hash:
mindtouch_log-access_log-2009-06-30-22-25-56-73BA57277AE4D23A
Inside, they are formatted:
a06bf2c208121e27e28a5fb7dec12fc6655efbb53aa7ba776a755b900e9620fb mindtouch [30/Jun/2009:21:49:50 +0000] 155.239.227.254 65a011a29cdf8ec533ec3d1ccaae921c 734BE96FA34BEAFB REST.GET.OBJECT MindTouch_2009_VM.zip "GET /mindtouch/MindTouch_2009_VM.zip HTTP/1.1" 206 - 4341304 379507974 307452 37 "http://s3.amazonaws.com/mindtouch/MindTouch_2009_VM.zip" "Mozilla/4.0 (compatible; MSIE 5.00; Windows 98)"
Lines of the log are deliminated by spaces. " " capture information that are single entries that should not be deliminated.
*entries into the parsedEntry is fairly straight forward. Version information has to be deducted from file name, provided they are correct.
None.
{
public interface Parser
{
ParsedEntry parseEntry(string entry, VersionDictionary dictionary);
}
}
| Images 0 | ||
|---|---|---|
| No images to display in the gallery. |
Copyright © 2011 MindTouch, Inc. Powered by