Jump to content

Fletch is an avid IRC user. He uses the XChat IRC client which stores logs for all of the messages. He uses the same IRC client across several different operating systems and computers. The logs are stored in the same format across all of the computers. He now wants to merge all of the log files from the multiple systems. However, there are thousands of lines of logs, and this task would take way too long to do manually. All of the files different years and months. All the logs for a single channel are saved in one file.


This is an example of some lines from one of those log files:

Mar 08 20:05:34  HeyMar 08 20:05:39  Hey I love your programs!Mar 08 20:06:27  This is an awesome challenge!

Each time Fletch starts a private conversation/joins a channel, this text with the date stamp is entered into the logs:

**** BEGIN LOGGING AT Sun Mar 27 19:48:27 2011

He collects the logs from all his computers on to one system, and stores them in /home/Fletch/irc/
The logs from each system are stored in a different directory to each other:

System 0System 1System 2System 3

Each directory has a different number of files.
The files are named as such:

esper-#channel1.logesper-#channel2.logesper-fletch_to_99.log

With "esper" being the network, and "#channel1" being the channel.
EACH DIRECTORY HAS MANY NETWORK AND CHANNEL LOG FILES.

This is an exmaple of the final output:

/home/Fletch/irc/merged/esper-#channel1.log/home/Fletch/irc/merged/esper-#channel2.log/home/Fletch/irc/merged/esper-fletch.log

Rules:

  • Your solution should take NO longer than 10 minutes to merge a 100mb of logs.
  • Devise a solution that will insert Fletch's logs ordered by the message timestamp into a merged directory.
  • Your program should only merge files that end with .log
  • Your program should only keep the message lines and remove any un-needed lines such as joins/parts
  • If a log clashes timestamp with another and the messages are the same, only ONE of these should be used.
  • Fletch has a few other systems in the future which he wishes to add, your program should be able to easily merge those log files with the current ones.

Sample test data: https://mega.co.nz/#!FNcUjZhL!CqdQGRQ97lYh0XYby0C03WfJmSdicf5va3C5Sbrxkao

There are 10 types of people in this world, those who can read binary and those who can't.

There are 10 types of people in this world, those who can read hexadecimal and F the rest.

~Fletch

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/
Share on other sites

Link to post
Share on other sites

I've updated the OP with the test data :)

There are 10 types of people in this world, those who can read binary and those who can't.

There are 10 types of people in this world, those who can read hexadecimal and F the rest.

~Fletch

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/#findComment-303852
Share on other sites

Link to post
Share on other sites

I'm not exactly sure what you are wanting here. Do you want us to merge the .log files into one or just grab all the files from multiple directories, check the first line of each file for the timestamp, and the order them and throw them into a final directory?

 

The objective is to Merge each of the individual files, with the same name, into one complete file. For example 

/home/Fletch/irc/system 0/esper-#channel1.log/home/Fletch/irc/system 1/esper-#channel1.log

Would become 

/home/Fletch/irc/merged/esper-#channel1.log

And inside the merged file would be all of the messages with the following tasks preformed: ordered by date and duplicates/useless information removed. All that should be kept is the messages and their time stamps.

 

Let me know if that helps you understand better :)

There are 10 types of people in this world, those who can read binary and those who can't.

There are 10 types of people in this world, those who can read hexadecimal and F the rest.

~Fletch

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/#findComment-304754
Share on other sites

Link to post
Share on other sites

The objective is to Merge each of the individual files, with the same name, into one complete file. For example 

/home/Fletch/irc/system 0/esper-#channel1.log/home/Fletch/irc/system 1/esper-#channel1.log

Would become 

/home/Fletch/irc/merged/esper-#channel1.log

And inside the merged file would be all of the messages with the following tasks preformed: ordered by date and duplicates/useless information removed. All that should be kept is the messages and their time stamps.

 

Let me know if that helps you understand better :)

That makes more sense. Your sample data looks the same as your final data so I wasn't exactly sure what you wanted. May do this later. 

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/#findComment-304891
Share on other sites

Link to post
Share on other sites

this looks like a project conveniently disguised as a challenge...hehe good try tho.

 

What language do you prefer? do you want it as an APP already? just curious...this is really very easy and not really that challenging. 

 

I would agreed with Vlad, this looks more like something I would see for a School programming assignment.

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/#findComment-306921
Share on other sites

Link to post
Share on other sites

I would agreed with Vlad, this looks more like something I would see for a School programming assignment.

this looks like a project conveniently disguised as a challenge...hehe good try tho.

 

What language do you prefer? do you want it as an APP already? just curious...this is really very easy and not really that challenging.

Actually both of you are wrong :p this is in no way related to my school. My solution can complete the task in less than a second, mine was written in java. I will post the code when I get home as I'm on my mobile currently.

Feel free to write it in any language of your choosing, PHP could probably complete this task redicously fast. This is all just for fun ;)

There are 10 types of people in this world, those who can read binary and those who can't.

There are 10 types of people in this world, those who can read hexadecimal and F the rest.

~Fletch

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/#findComment-307466
Share on other sites

Link to post
Share on other sites

Heres my solution :)

All files have dumped to: D:\Folders\Programming\Recursion\log files\merged\It took 0:0:0:1.341 seconds to merge 0TB 0GB 72MB 137KB 1012B of log files.

package me.fletchto99.merger;import java.io.BufferedReader;import java.io.BufferedWriter;import java.io.File;import java.io.FileInputStream;import java.io.FileWriter;import java.io.IOException;import java.io.InputStreamReader;import java.text.ParseException;import java.text.SimpleDateFormat;import java.util.ArrayList;import java.util.Calendar;import java.util.Collections;import java.util.Comparator;import java.util.Date;import java.util.HashMap;import java.util.HashSet;import java.util.concurrent.Callable;import java.util.concurrent.ExecutorService;import java.util.concurrent.Executors;public class LogfileMerger {	private static void merge(final ArrayList<File> files, final File output) {		// allow the hash set to automatically remove duplicates		final ArrayList<LogMessage> lines = new ArrayList<LogMessage>();		for (final File f : files) {			try {				lines.addAll(FileUtils.read(f));			} catch (final Exception e) {				System.out						.println("Error merging file: " + f.getAbsolutePath());			}		}		Collections.sort(lines, new LogMessageComparator());		FileUtils.write(output, lines);	}	public static void main(final String[] args) {		final long starttime = System.currentTimeMillis();		String path = "";		if (args.length < 1) {			System.out.println("No directory or file specified!");			return;		} else if (args.length > 1) {			for (final String str : args) {				path += str + " ";			}			path = path.substring(0, path.length() - 1);		} else {			path = args[0].replace("\\", File.separator);		}		final String outputDir = path				+ (path.endsWith("\\") ? "merged" : "\\merged\\");		final HashSet<File> files = FileUtils.getFiles(new File(path.trim()));		final HashMap<String, ArrayList<File>> matches = new HashMap<String, ArrayList<File>>();		if (!files.isEmpty()) {			for (final File f : files) {				if (matches.containsKey(f.getName())) {					matches.get(f.getName()).add(f);				} else {					matches.put(f.getName(), new ArrayList<File>());					matches.get(f.getName()).add(f);				}				try {				} catch (final Exception e) {					System.out.println("Error reading file: " + f.getName());				}			}			if (!matches.isEmpty()) {				try {					final ArrayList<Callable<Object>> futures = new ArrayList<Callable<Object>>();					final ExecutorService service = Executors							.newFixedThreadPool(Runtime.getRuntime()									.availableProcessors());					for (final String s : matches.keySet()) {						futures.add(Executors.callable(new Runnable() {							public void run() {								merge(matches.get(s), new File(outputDir, s));							}						}));					}					service.invokeAll(futures);					service.shutdown();				} catch (final InterruptedException e) {					System.out							.println("There was an error while executing the task! Please try again.");					return;				}			} else {				System.out.println("No files have been found!");				return;			}			System.out.println("All files have dumped to: " + outputDir);		} else {			System.out.println("No files found!");		}		long size = 0;		for (final File f : files) {			size += f.length();		}		System.out.println("It took " + getRuntime(starttime) + " to merge "				+ getFileSize(size) + " of log files.");	}	private static String getRuntime(final long starttime) {		try {			long millis = System.currentTimeMillis() - starttime;			final long days = millis / (1000 * 60 * 60 * 24);			millis -= days * (1000 * 60 * 60 * 24);			final long hours = millis / (1000 * 60 * 60);			millis -= hours * (1000 * 60 * 60);			final long minutes = millis / (1000 * 60);			millis -= minutes * (1000 * 60);			final long seconds = millis / 1000;			millis -= seconds * 1000;			return days + ":" + hours + ":" + minutes + ":" + seconds + "."					+ millis + " seconds";		} catch (final Exception e) {			return "0:0:0:0.0";		}	}	private static String getFileSize(final long filesize) {		try {			long bytes = filesize;			final long tb = bytes / (1099511627776L);			bytes -= tb * (1099511627776L);			final long gb = bytes / (1073741824L);			bytes -= gb * (1073741824L);			final long mb = bytes / (1048576L);			bytes -= mb * (1048576L);			final long kb = bytes / 1024;			bytes -= kb * 1024;			return tb + "TB " + gb + "GB " + mb + "MB " + kb + "KB " + bytes					+ "B";		} catch (final Exception e) {			return "0B";		}	}}class FileUtils {	public static HashSet<File> getFiles(final File path) {		final HashSet<File> files = new HashSet<File>();		if (!path.exists()) {			// warn the user that they specified an invalid path			System.out.println("The file specified does not exist: "					+ path.getAbsolutePath());			return files;		}		if (path.getAbsolutePath().contains("merged")) {			System.out.println("Ignoring merged dir!");			return files;		}		if (path.isFile()) {			if (!path.getName().endsWith(".log")) {				System.out.println(path.getAbsolutePath()						+ " is not a valid log file!");				return files;			}			files.add(path);			return files;		} else if (path.isDirectory()) {			final File[] listOfFiles = path.listFiles();			if (listOfFiles != null) {				for (final File listOfFile : listOfFiles) {					files.addAll(getFiles(listOfFile));				}			}		}		return files;	}	public static HashSet<LogMessage> read(final File file) throws IOException {		final HashSet<LogMessage> lines = new HashSet<LogMessage>();		final BufferedReader br = new BufferedReader(new InputStreamReader(				new FileInputStream(file)));		String s = null;		int date = Calendar.getInstance().get(Calendar.YEAR);		try {			while ((s = br.readLine()) != null) {				// skip non message lines				if (!s.contains("<")) {					// remove the headers					if (s.startsWith("**")) {						try {							date = Integer									.parseInt(s.substring(s.indexOf(":") + 7));						} catch (final Exception e) {							date = Calendar.getInstance().get(Calendar.YEAR);						}					}					continue;				}				try {					final String message = date + " " + s;					lines.add(new LogMessage(new SimpleDateFormat(							"yyyy MMM dd HH:mm:ss").parse(message.substring(0,							20)), message));				} catch (final ParseException e) {					System.out							.println("Line invalid, continuing to next line!");				}			}		} finally {			br.close();		}		return lines;	}	public static void write(final File file, final ArrayList<LogMessage> lm) {		try {			// create the log file and directories			if (!file.exists()) {				new File(file.getParent()).mkdirs();				file.createNewFile();			}			final BufferedWriter output = new BufferedWriter(new FileWriter(					file));			// write the lines			for (final LogMessage l : lm) {				output.write(l.getMessage().trim() + "\r\n");			}			output.close();		} catch (final IOException e) {			System.out.println("There was an error writing the file: "					+ file.getAbsolutePath());			e.printStackTrace();		}	}}class LogMessage {	private final Date date;	private final String message;	public LogMessage(final Date date, final String message) {		this.date = date;		this.message = message;	}	public String getMessage() {		return message;	}	public Date getDate() {		return date;	}}class LogMessageComparator implements Comparator<LogMessage> {	public int compare(final LogMessage l1, final LogMessage l2) {		return l1.getDate().compareTo(l2.getDate());	}} 

There are 10 types of people in this world, those who can read binary and those who can't.

There are 10 types of people in this world, those who can read hexadecimal and F the rest.

~Fletch

Link to comment
https://linustechtips.com/topic/23767-programming-challenge/#findComment-308254
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×