Sep 05, 2008 11:51
First off a warning, there will probably be lots of geek-speak in this post as well as some really bad puns. (Starting with the one in the subject) Now to explain the pun above, I have been playing with some statistics from the processing jobs we have been running on this trip. Though every blade (not cutting edge, except perhaps my first trip to this boat a couple years ago, but a dual processor thin slice of computing power) on this boat is an IBM HS20, all blades were not created equally. I have been analyzing the performance of our system which has 24 blades that are dual processors giving us a total of 48 executive shells or nodes to work with. One of the Key Performance Indicators that I am supposed to do for my job is to maximize the efficiency onboard. Another of my assigned tasks is to implement a piece of software called "scheduler" that in theory is supposed to improve our ability to process the data. The way that works is that it checks the queue of jobs to run against the available nodes and the amount of disk space required for the job and what is available. If the jobs will not overflow the disk space and nodes are available, the scheduler will launch the jobs based on priority and node availability. The thing will not care which node runs what job.
Now comes the part where those two tasks get in the way of each other. I went through the log files for a metric shitload of jobs which have run so far (with lots more to go) to analyze the throughput we have been getting (because I am anal-ytical like that) and have discovered proof to my statement about all blades are not created equally. The blades are supposed to be identical, same internal disk space, same number of processor running at the same clock speed with the same amount of memory. So why is it that when I look at the sadistics err statistics, I find that the nodes are not processing the same amount of data at the same rate? Furthermore, the discrepancies seem to be dependent on which of the three major steps of the processing are running. Some nodes seem to handle I/O better while others seem to have their processing speed dependent on CPU processing intensity. The I/O intensive jobs have some blades that run quickly while others not don't do so good. The other two major processing steps are much more compute intensive and with each of them there is a difference for which blade is more efficient and which become zippy the wonder slug. The worst case is that for the longest running stage of our processing, there are two blades that only run at about 80% of the speed of the average of all the blades for that particular type of job. Those jobs run on average for 35 hours, so the 20% slow down means that sometimes jobs will take 44 hours instead. Not a good thing when the processing is time critical!
So how that is an issue with the running of the scheduler is that it will not use those slow nodes when available regardless of type of job. Whereas being the good student of arithmetic that I am, it is easy for me to see that I and my processing crew if they will do it, can set the nodes so that no final step is held up by the fact that the slow nodes are running an important prerequisite job for the final deliverables. My boss is a big fan of statistics, spreadsheets and improved system performance, so I have been making a spreadsheet complete with bar graphs showing why the mandate to use scheduler onboard is a really bad idea.
In other news, last night after I had left the instrument room, we got an email for the Americans' flight details. The flights looked good except for one minor drawback. They were for the week after crew change! Fortunately, they were sorted out and now I have my flights for crew change for the correct day. This trip will be a rarity for me as well. It has been a very long time since the last time I was out of the country and managed to make it home on the same day as crew change. I suppose there is an advantage to not having an onsigning processing crew to hand over the job to. This trip will be the first one in about 6 years where I will be on the first chopper off the boat! I won't actually get home the day of crew change, but baring delayed flights I will be at the Tulsa airport at 23:40 on 17 September. So it should be relatively shortly after midnight on 18 September when I get home. Yay for the one mores! (One more Thursday and Friday to go onboard before going home.)
Speaking of choppers, it appears this has been a bad week for helicopters. I saw where a helicopter had crashed into a drilling platform in Dubai earlier this week. I also just saw a little internet headline about a Coast Guard chopper crashing off Hawaii. I hope all of the helicopter bad karma is sorted out now; I definitely would not want one to go down in the North Sea, especially not the one I will be on!
work,
processing hardware,
helicopters,
travel,
statistics