Thursday, September 24, 2009

SQL Transactions - Faster Faster!

So I went into the office yesterday and built a good bit of the stuff from the last post. Still some work to do, but it should be a really useful tool. Here's some of the stuff I learned along the way.

The big one: SQL Transactions rock. You tell the console "start transaction", and that tells the SQL server (mySQL 5.something in this case) "I'm about to do a number of queries, but don't worry about writing the results to disk until the end - Nothing I'm going to do is dependent on what came before it in this transaction."

Once you're done doing your thing, you can "commit" or "rollback". That makes it a nice testing tool as well.

In bootstrapping myself to a properly working system, I ended up with a 74MB flat text file full of SQL INSERTs that needed to be processed. Stupidly, I just told the console "source [my-file]". It took four hours. Tried the same thing again with transactions, and it was done in less than a minute.

What else did I learn? Well, I picked up a few things about the GD library. It's surprisingly easy. I couldn't find a graphing frontend to it that worked on Windows to my satisfaction, so I just started writing a few graphing functions of my own, and they work pretty well.

Oh yeah, I had never used VIEWs in SQL before, but they seem to be pretty powerful. They solved a problem with a subquery that mySQL didn't support, which was real nice. I was trying to grab the high x% of a set of values, and do operations on those (ie, min/max, average, etc).

That happens to me a lot with programming. I was a double E in school, so my formal training in things like database theory is pretty much nil. Everything I know I picked up from having to solve specific problems. Maybe I should go see if I can audit a class or something, I might learn a thing or two.

Wednesday, September 23, 2009

Reinventing the Wheel - Perfmon Analyzer Notes

Can't sleep. Maybe if I take some notes on the app I've been thinking about, it'll leave my head.

So I spent some time today writing Perl scripts to help me out with some perfmon data. Customer sent me about 200 files with pretty much every Windows counter, each file about 70MB. That's a metric fuckton of stuff to wade through. So the scripts I wrote:
  1. Call relog to get rid of everything but the counters I want
  2. Call relog to consolidate all the records that appear to come from one server into a single file (okay, 1 and 2 really happen in the same loop. Who's counting?)
  3. Convert over to a per-server CSV file for easy Excelling
  4. Collapse into a couple of Excel files with some extra columns to make it easy to do PivotTable-type stuff
All of that is cool. But there's still things I want to do with the data. For each server, I want to:
  • Find average IOPs
  • Find peak (top value, average of top 2%, 4%, etc) - Note: There's probably a statistical operation I'm thinking over here. Get out your damned statistics book.
  • I think maybe the other thing I'm doing is trying to figure out what percentage of operations are x number of standard deviations about average. Maybe I just want some metric of how "bursty" the system is.
  • All the other basic stuff - What percent is reads versus writes, what will that look like after a RAID penalty, how big are my average reads and writes. No reason it couldn't try to fit that data to some patterns and maybe guess at an IO profile.
  • Kill the fly that's found my monitor in the dark. Stupid horse farm.
  • Generate a pretty graph of the above
  • Do all of the above for both on-hours and off-hours work, and maybe separately for a backup window
For the entire set of servers, and any subset of servers I choose (say, a SQL cluster), I want to
  • Find the various IOP values above. There's probably a way to apply Erlang-style analysis and say "I want only a 1% chance of having peak IOPs above this given SAN capacity"
  • Associate the servers above with amounts of storage, and graph IOPs versus server and metaLUN size. Ideally, this results in a pretty 1/x graph and helps me easily identify flash and SATA candidates
  • I killed that fly. Woohoo!
  • Heck, there's no reason that the system couldn't take a swag at trying to devise a basic LUN layout.
All of the above is entirely possible with Excel, but it would take an extraordinary amount of time. There's no reason it couldn't be automated. For that matter, all of the above assumes that the data has been ingested into a SQL database, which means I could normalize between perfmon, iostat, and whatever other stuff may be out there.

Since it's all in a big ol' database, that opens the doors to larger sets of statistics over time. No reason I wouldn't keep EVERYTHING in there.

Some of this stuff - the LUN layout, the IO profile - could take some work. Most of it is just combing some datasets and doing basic math.

So the thing is - Surely this has been done to death a thousand times before. Where is this application?

Wednesday, September 16, 2009

Today was a Good Day

So there's this big ol' flowchart on Geekologie about Ice Cube's song. And it made me realize something.
No barkin' from the dog, and no smog
And momma cooked the breakfast with no hog

I've listened to this song for years, and never realized that Ice Cube lives with his mom.

Dude, there's nothing everything is gangsta about that.

Update:
It turns out I didn't understand. Reverend Moyers corrected me:
It's so that he can have "everything in his mamma's name" and when his drugs and drug money and possessions get confiscated, they cain't touch mamma's shit - and you still got it when the police leave!

Monday, September 14, 2009

Running ESX4 in VMware Workstation 6

It's like a mystery, wrapped in an enigma, wrapped in crispy delicious bacon.

That is, running an ESX 4 server inside VMware workstation. You know, for when you left your high-power blade server in your other pants.

Here's the big stuff. Build your VM. Get into your VMX file with your favorite text editor. Here's a few key things you'll need.
monitor_control.restrict_backdoor = "TRUE"
monitor.virtual_mmu = "software"
monitor.virtual_exec = "hardware"
monitor_control.vt32 = “true”

Also, make sure the VM is set up to use your processor's virtualization functions (ie, Intel VT or AMD-V).

Anyway, I'm waiting for my portable hard drive to come in so I can really get some good VM sprawl going. This poor 80GB laptop drive just ain't cutting it.

In other news... I spent the weekend playing The Beatles Rock Band. I'm already looking forward to getting home and playing the drum roll from Come Together again.