Okay, let's see.
There is an imminent terrorist attack.
It is in Washington, DC.
What do YOU think the target is?
Monday, March 9, 2009
Are the characters on 24 the stupidest people ever?
Monday, January 12, 2009
New site
I have set up a new website for hockey statistics.
The data is from the hockey-databank yahoo group. You can use my tool to query the statistics and generate custom reports on hockey statistics.
Here is the URL: http://www.hockeyhunter.com
Monday, December 22, 2008
Passwords
I want to try and bridge a gap here, so to speak. I feel as someone interested in education, I should attempt to educate in some ways by communicating the things I know and learn in a simple language, and try to make uninteresting things sound interesting by highlighting their importance and pertinence to the layman's life.
So let's talk about passwords.
I have two goals with this post:
1) To explain the technologies used by password-cracking software.
2) To demonstrate the insanely-high speed that modern computers can operate.
Part I) The Explanation
Everybody uses them, and many people have some understanding of how the work. A lot of work has been done trying to educate the public about how to have secure passwords, and yet a lot of people still choose very poor ones. I think that there is still a lack of understanding among the general public about how passwords get broken. It is my goal in this post to explicate password-cracking in an effort to enlighten the public and exterminate the risk of my friends getting hacked.
IT professionals and just general geeks like me know about these things and want to hear about hashes, salts and rainbow tables. If you do not have a clue what I am talking about, then this post is for you. I am going to ignore those for this post and focus on generalities.
Basically, there are two methods to cracking passwords:
1) Wordlist
2) Brute Force
While these techniques have been around for as long as I have, the big change over time has been computer speed. Moore's law dictates that computing power doubles every other year, which means that a password that might have taken 24 hours to break in 1998 can probably be broken now in about 45 minutes, without even changing the techniques used.
Method #1: Wordlists
Ever sat down at a friends computer and tried to guess their password? Maybe it's the name of their dog! No wait, the name of their cat! This is, in essence, the methodology used here. Of course, a computer can guess passwords a hell of a lot faster than you can. Essentially, the software is fed a "list" of words to try, and it tries them all, one by one, until it gets it right. This list might be a concise list of popular passwords, or it might be a list of every word in the dictionary.
Method #2: Brute Force
Ever thought about how long it would take to guess your PIN in case you forgot it? First you would try 0000, then 0001, then 0002, etc. and eventually you would get to your PIN. This is brute force. It literally means trying every combination possible until you get one right. Of course, if you were actually trying to guess a PIN, you wouldn't start at 0000 would you? Who has a PIN of 00XX? A smarter way would be to start in the middle, and work your way out. Modern day cracking software works in about the same fashion. It will try every possible combination, but it does so in an intelligent order to reduce the time it will likely take.
Part II) The Demonstration
The fact is, it might shock you to learn how fast these programs can work. It shocked me. One of the most popular, renowned, and oldest software-cracking tools available is called "John the Ripper." I downloaded John and put it to work on my own machine so that I could test it out a bit. I have read most of the documentation and obtained knowledge on how it functions, and I intend to explain that here, as best as I can.
I made a bunch of fake windows accounts to test some passwords. I made 3 to start, with what I figured were weak, medium, and strong passwords.
My weak password was "apple." I guessed this would take a couple seconds to break.
My medium password is a password we use at work that involves a word and one number.
My strong password is a password that I personally have used for about 8 years, that involves letters and numbers in a way that does not resemble any sort of word.
I sic'd John on these computers and barely even left the enter key when it responded with answers to the first two. John only gives granularity down to seconds, so I can't even say the exact speed it took other than it was not even one second. Literally, my "medium" password that we use at work took less than a second to crack. I was pretty amazed.
I was even more amazed when it cracked my "strong" password just 2 hours later.
It was at this point I had to learn more and started looking more deeply into how JtR operates. Luckily, it is a very open and well-documented software with a lot of options and configuration files. I was able to learn what I needed very quickly.
John has three different "modes." By default it tries all three in order, but it contains options to only try each one individually.
Mode 1: Single mode
Basically, this is where it uses a wordlist of just one word, but tries many variations of that word. It calls these variations "mangles." There are a LOT of mangles that it tries on that word, everything from "replace every o with a 0" to "try the word backwards" to "reverse the word and capitalize every other character." What is the word that it is trying? The username.
For example, suppose your username is "admin." this mode will try "admin," "nimda," "admin1," "adm1n," etc. and a whole lot more. A LOT more. So if you think you're being cute by using a backwards-username as a password, you're not. Any variation that you can think of, JtR probably also tries. So don't use any mangle of your username as your password.
Mode 2: Wordlist mode
This is probably the most efficient and effective. JtR comes with a list of 3,706 common passwords and tries them all one-by-one. It also has the option to try "mangles" as above. However, because it is trying thousands of words instead of just one, the default amount of mangles it tries is far less. With that said, JtR is extensible so any user can add more mangles to the list with ease.
Here is a link to the wordlist that JtR comes with:
WordList
The first thing it does is try every word on that list, a process which takes less than a second. After that, it goes through the list again, trying every word in lowercase. Then, every word in uppercase. All told, there are 26 different mangles that are tried by default. I will not list them all here for fear of tedium, but I will post a link.
Let me put it bluntly: If your "password" is a word in the dictionary, you are vulnerable. Adding a number (e.g. "apple1" instead of "apple") will not help. A computer can break it in seconds (or less.)
I wanted to really underscore the performance aspect involved here, so I did some testing. First of all, I built my own wordlist. Well, not really. I found a wordlist that is supplied for spell-checking programs like the one in MS word. This list is essentially every word in the dictionary. I then added another list which contains a bunch of popular abbreviations. Then another list that contains a bunch of "slang" words and such and popular misspellings. All told, my list had 86,542 words in it to try.
I ran JtR using the custom wordlist (no mangling), and it completed in less than a second. Slightly annoyed, I tried to slow it down by enabling the mangling options. Remember, it is trying every word in a list that is 86,542 words long, then going back to the top and trying every word backwards, and so on. There are 26 mangle options which means there are 86,542 X 26 = 2,250,092 different passwords to try.
How long did JtR take to try over 2 million different passwords? One second. One measly second. Barely detectable, really. And this is while I was running another JtR session in the background, along with about 40 other things (bitTorrent, MSN, Avast!, Skype, etc.) And this is on my computer, which is hardly a supercomputer.
However, none of the 2M+ passwords matched the new ones I had put into my password file, so I moved on to the next technique...
Mode 3: Incremental mode
This essentially means Brute Force. This mode absolutely guarantees that it will crack any password ... eventually. Depending on the length of the password and size of the pool it could take seconds or years.
It's all about probability. If you have a password that must be exactly 8 characters, and you are limited to only using letters, and case doesn't matter (m = M,) then there are 268 = 208,827,064,576 different combinations.
Like all numbers, this one requires context. Two-hunded-and-eight billion is sure a large number. Far bigger than the measly 2 million I tried in my previous demonstration. There are, however, some things to keep in mind:
1) It will not be necessary to try all combinations, only the amount it takes until it gets the right one.
2) The password cracker will not, likely, do "brute force" as we would intrinsically think of it. That is, it has an intelligent order with which it tries passwords. This will decrease the time it takes to get your password.
3) Computers are fast. To test, I configured my computer to try every alpha password up to 7 chars in length, case insensitive. The amount of passwords to try here are:
267 + 266 + ... + 261 = 8,353,082,582. It ran through this exercise in about 50 minutes, or 3,000 seconds. This means it was able to try around 2.7 million passwords per second.
Do the math. With 208 billion possible combinations and 2.7 million combinations tried every second, my computer can test every possible combination of an 8-digit letters-only password in about 21 hours. Chances are, it will get to yours overnight.
4) Computers are always getting faster. Now, my computer can get every combination in 21 hours. In two years, it will be able to do it in 10 hours. By 2014, It will be able to do it in 2 hours.
There was a time where 8-digit passwords were sufficient. When JtR first came out, it was maxed at about one thousand combos per second. But that ship has sailed, and it seems like too many consumers are left on the beach.
Part III) The Advice
So how do you improve your password?
1) Length. The beauty of exponents is that adding one digit to your password can make a world of difference. By extending your password to 9 characters instead of 8, you would add 5,429,503,678,976 (5.4 trillion) more combinations! I mentioned before that my computer can try all combinations of an 8-character letters-only password in 21 hours. Guess what? To try all combinations of a 9-character letters-only password would take 23 days. Ten characters? Six-hundred-and-five days. Hooray for exponents!
The problem, of course, is that people find longer passwords harder to remember.
Well, a big problem is that people still use the word "password." "Password," really, is a bit of a misnomer. There is no reason your password needs to be a word. It can be a phrase. A good one to use would be a song lyric or even a title. Is it really hard to remember/type "StairwayToHeaven" ? Such a password is not all that secure, but it's 10 times better than "heaven" or any other word. Since this password is 16 chars long, a brute force attack would mean trying 2616 different combinations to get all of them, and 5216 combinations if it's case sensitive.
5216 = 2,857,942,574,656,970,690,381,479,936.
Of course, it will be able to crack it far faster, because of other factors that I won't get into to keep this post simple. Which means you need...
2) Numbers. But don't put "1" at the end. Suppose your favorite number is 9 and song is Stairway. Why not: "Stairway9To9Heaven"? Is that hard to remember? This would leave your password impervius to the wordlist attack and now a Brute Force attack would have to be programmed to run 6218 different combinations.
3) Symbols. Much like numbers, you can usually put symbols into words just as easily. What I usually do is think of a number and then use whatever symbols correspond to that number. In the above example we used the number 9. If you SHIFT+9, you get "(." So our password becomes: "Stairway(To(Heaven." Now we're talking about 8516 different combinations.
So that's my primer, so to speak, on passwords. Please choose your passwords carefully, and keep the things you learned today in mind!
Thursday, May 29, 2008
Leadoff Walks
During last nights Jays/A's game, BJ Ryan came into the ninth inning to close out the one-run lead, and immediately issued a lead-off walk to Mark Ellis. I hate leadoff walks, and so does Rance Muliniks. In fact, Rance remarked (paraphrasing):
"I would estimate that the leadoff walk comes around to score 60-65% of the time."
I generally like Muliniks, but whenever a broadcaster makes a claim like this, something triggers in my brain. Rance went on to give reasons why he thinks walks come around to score so often, but I wasn't listening.
In The Book, the authors present a table of run values for all the batting events by base/out state. The unintentional walk, with the bases empty and no outs, has a run value of 0.41, so that doesn't help Rance's case. Not ever someone to take someone on their word, I ran the numbers myself, using data from 1998-2007. Here's what I found:
BBs R %
-------------------------
27929 10797 .3865
So in 27,929 leadoff walks, the runner came around to score about 39% of the time. Pretty close to the value from the book, and not nearly as high as 65%.
But that's not really the issue. The assertion, see, is that leadoff walks are "special." That is, the runner is going to score more often after being walked than he would had he singled (or reached first some other way.) That's easy to check.
Let's see what the numbers are for leadoff singles:
1Bs R %
-------------------------
62645 24031 .3836
Oh. 38%. Basically the same. Drat.
Sorry, Rancey baby.
Wednesday, May 28, 2008
Juicing the bases early, part II
As a followup to my last post, I am doing another study with a few changes. First, I added 1997 and 1998 to the sample. I would have added 1999 too but RetroSheet apparently doesn't have it for some reason. Second, I fixed some minor query problems that were causing me to double-count PA's with balks and pickoffs. Third, I altered my SQL procedure to output the results broken down by innings and score differential.
So, just to recap:
We are looking for plate appearances in completed games played between 1997-2007 for which there is pitch data, where:
- There are runners on second and third, one out.
- Prior to the seventh inning.
- The batter is in one of the top-six spots of the order.
- The batter is not Barry Bonds.
We are seperating PA's into two types: PA's where there is an obvious intentional walk (four straight intentional balls,) and PA's where there is not. Then, we are looking at how often teams wind up winning the game when they walk the batter vs. when they let him hit.
The following is the results:
| 1 | -5 | 0-0 | 0-0 | 0.000 | 0.000 | 0.000 | 0.000 |
| 2 | -5 | 2-21 | 0-4 | 0.087 | 0.000 | -0.087 | -0.348 |
| 3 | -5 | 2-9 | 0-0 | 0.181 | 0.000 | -0.181 | 0.000 |
| 4 | -5 | 2-43 | 0-7 | 0.444 | 0.000 | -0.444 | -0.311 |
| 5 | -5 | 2-26 | 0-6 | 0.071 | 0.000 | -0.071 | -0.429 |
| 6 | -5 | 0-44 | 0-14 | 0.000 | 0.000 | 0.000 | 0.000 |
| 1 | -4 | 0-1 | 0-0 | 0.000 | 0.000 | 0.000 | 0.000 |
| 2 | -4 | 4-45 | 1-7 | 0.082 | 0.125 | 0.043 | +0.347 |
| 3 | -4 | 4-20 | 0-4 | 0.167 | 0.000 | -0.167 | -0.667 |
| 4 | -4 | 4-63 | 0-8 | 0.060 | 0.000 | -0.060 | -0.478 |
| 5 | -4 | 2-57 | 1-10 | 0.034 | 0.091 | 0.058 | +0.627 |
| 6 | -4 | 1-69 | 0-9 | 0.014 | 0.000 | -0.014 | -0.129 |
| 1 | -3 | 0-0 | 0-0 | 0.000 | 0.000 | 0.000 | 0.000 |
| 2 | -3 | 19-77 | 0-11 | 0.198 | 0.000 | -0.198 | -2.177 |
| 3 | -3 | 11-60 | 1-17 | 0.155 | 0.056 | -0.994 | -1.789 |
| 4 | -3 | 13-90 | 2-3 | 0.126 | 0.400 | 0.274 | +1.369 |
| 5 | -3 | 4-74 | 5-33 | 0.051 | 0.132 | 0.080 | +3.051 |
| 6 | -3 | 6-66 | 0-30 | 0.083 | 0.000 | -0.083 | -2.500 |
| 1 | -2 | 32-109 | 2-3 | 0.227 | 0.400 | 0.173 | +0.865 |
| 2 | -2 | 31-104 | 1-3 | 0.230 | 0.250 | 0.020 | +0.081 |
| 3 | -2 | 23-104 | 3-19 | 0.181 | 0.136 | -0.045 | -0.984 |
| 4 | -2 | 13-93 | 1-8 | 0.123 | 0.111 | -0.012 | -0.104 |
| 5 | -2 | 20-107 | 8-28 | 0.157 | 0.222 | 0.065 | +2.330 |
| 6 | -2 | 7-67 | 1-25 | 0.095 | 0.038 | -0.056 | -1.459 |
| 1 | -1 | 96-219 | 6-10 | 0.305 | 0.375 | 0.070 | +1.12 |
| 2 | -1 | 42-92 | 1-3 | 0.313 | 0.250 | -0.063 | -0.254 |
| 3 | -1 | 72-163 | 6-15 | 0.306 | 0.286 | -0.021 | -0.434 |
| 4 | -1 | 23-109 | 2-7 | 0.174 | 0.222 | 0.048 | +0.432 |
| 5 | -1 | 35-94 | 10-15 | 0.271 | 0.400 | 0.129 | +3.217 |
| 6 | -1 | 22-73 | 6-24 | 0.232 | 0.200 | -0.032 | -0.947 |
| 1 | Tied | 288-416 | 12-20 | 0.409 | 0.375 | -0.034 | -1.090 |
| 2 | Tied | 29-41 | 0-2 | 0.414 | 0.000 | -0.414 | -0.829 |
| 3 | Tied | 138-228 | 6-15 | 0.377 | 0.286 | -0.091 | -1.918 |
| 4 | Tied | 54-94 | 6-13 | 0.365 | 0.312 | -0.049 | -0.932 |
| 5 | Tied | 42-114 | 12-16 | 0.269 | 0.429 | 0.159 | +4.462 |
| 6 | Tied | 34-74 | 11-22 | 0.315 | 0.333 | 0.019 | +0.611 |
| 1 | +1 | 34-46 | 1-2 | 0.425 | 0.333 | -0.092 | -0.275 |
| 2 | +1 | 16-20 | 0-0 | 0.444 | 0.000 | -0.444 | -0.000 |
| 3 | +1 | 106-109 | 2-1 | 0.493 | 0.666 | 0.174 | +0.521 |
| 4 | +1 | 46-52 | 4-3 | 0.469 | 0.571 | 0.102 | +0.714 |
| 5 | +1 | 66-57 | 4-7 | 0.537 | 0.364 | -0.173 | -1.902 |
| 6 | +1 | 58-60 | 9-11 | 0.492 | 0.450 | -0.042 | -0.831 |
| 1 | +2 | 28-23 | 0-1 | 0.549 | 0.000 | -0.549 | -0.549 |
| 2 | +2 | 10-7 | 0-0 | 0.588 | 0.000 | -0.588 | -0.000 |
| 3 | +2 | 68-38 | 1-0 | 0.642 | 1.000 | 0.358 | +0.358 |
| 4 | +2 | 47-35 | 1-0 | 0.573 | 1.000 | 0.427 | +0.427 |
| 5 | +2 | 71-31 | 1-2 | 0.696 | 0.333 | -0.363 | -1.089 |
| 6 | +2 | 54-38 | 6-2 | 0.587 | 0.750 | 0.163 | +1.304 |
Phew! Sorry about that monstrosity. We can probably ignore the first 18 lines I guess. Frankly, when teams are behind by 5 runs in the 3rd inning, and are facing this situation, there is going to be a lot of losing regardless of what they do, and not many oIBB's are issued. It's in the middle of the table where it starts to get interesting.
In my previous study, the "-1" situation was the outlier. When down by a run, teams actually seemed to gain wins when IBBing the batter early. Here we see it broken down further. Recall that I am using a different (larger) sample size this time around, but the effect is still there. Almost all of the difference comes in the 5th inning, where teams won 10 of 25 games after issuing an IBB, which is more than 3 full wins better than the non-IBB sample. The rest of the "down by a run" scenarios more or less even out. Is this a sample size fluke, or something else I haven't considered?
Since nobody really enjoys looking at a 54-row table, let's do some grouping. I will group the six innings into three groups of 2. Let's also group the scores into: Tied, ahead by 1 or 2, and behind by 1 or 2.
Here are the Wins gained by IBB's in these situations:
| 1-2 | Behind by 1 or 2 | +1.817 |
| 1-2 | Tied | -1.919 |
| 1-2 | Ahead by 1 or 2 | -0.824 |
| 3-4 | Behind by 1 or 2 | -1.090 |
| 3-4 | Tied | -2.850 |
| 3-4 | Ahead by 1 or 2 | +2.021 |
| 5-6 | Behind by 1 or 2 | +3.140 |
| 5-6 | Tied | +5.073 |
| 5-6 | Ahead by 1 or 2 | -2.517 |
My findings seem to actually be in line with both MGL and Peter. MGL asserted that intentional walks in this situation was not a good idea, and my findings seems to show that to be mostly true. Peter's assertion was that the results were too small to say much, and, frankly, that seems to be true as well. However, look at with innings 5 and 6, with the team tied or behind by a run, a total of 8 wins were gained with intentional walks.
Indeed, in innings 5 and 6 with the score either tied or the pitching team behind by a run, teams who did not issue the walk went 160-529, for a .232 winning%, whereas teams that took the bat out of the hitter's hands in this situation went 48-130 for a .282 winning%, thus improving their win total fairly substantially. I find this pretty interesting. Innings 1-4 however, seem to be a fairly consistently against issuing the walk.
So, in summary, is it a good idea to walk the bases loaded when it's early in the game and a good hitter is at the plate and first base is open? I have no idea.
Here is the IBB_GAIN broken down by inning. Too lazy to make a fancy table like before so I'll just copy/paste from sql*plus...
INNING SUM(IBB_GAIN)
---------- -------------
5 10.2684563
4 1.11693613
1 .074129052
2 -3.1787919
6 -3.9504488
3 -4.9123056
Tuesday, May 27, 2008
Juicing the bases early
Recently, MGL posted an article about intentional walks early in baseball games, and Peter Jensen followed-up with an article on TangoTiger's site. I read Peter's article and decided to try similar things, with a larger sample and more granular data. I have never really tried to do this kind of analysis before, so I hope I don't make a fool of myself.
In any case, I won't go over too much of the details because you can read Peter's article here and follow-up comments here.
I am using a very similar methodology to what Peter used, with a few changes:
1) I am using data from 2000-2007.
2) I am only counting walks that are "obvious" IBB's. That is, the result of four consecutive intentional balls.
3) I am only looking at PA's where the batter is in the top 6 spots of the order. This does two things for me. First, it controls a little bit for the quality of the batter, and second, it allows me to incorporate National League data which increases my sample size. I am, however, throwing out Barry Bonds PA's because, well, because.
4) Instead of breaking it up into two "groups," I am seperating the events by the score.
All in all, I identified 6094 PA's between 2000-2007 with runners on second and third, one out, prior to the 7th inning in completed games, where the batter was in the top-six spots of the order. In 564 of the cases, an "obvious" intentional walk was issued.
Here are the oIBB counts, broken down by the relative score from the vantage point of the pitching team:
| Tied | 1485 | 129 | 8.69 | 22.87 |
| -2 | 720 | 99 | 13.75 | 17.55 |
| -1 | 1030 | 97 | 9.42 | 17.20 |
| -3 | 472 | 88 | 18.64 | 15.60 |
| -4 | 288 | 39 | 13.54 | 6.91 |
| +1 | 625 | 39 | 6.24 | 6.91 |
| -5 | 166 | 28 | 16.87 | 4.96 |
| -6 | 104 | 17 | 16.35 | 3.01 |
| +2 | 417 | 14 | 3.36 | 2.48 |
| -7 | 61 | 6 | 9.84 | 1.06 |
| -8 | 29 | 5 | 17.24 | 0.89 |
| -10 | 15 | 1 | 6.67 | 0.18 |
| -9 | 21 | 1 | 4.76 | 0.18 |
| +4 | 155 | 1 | 0.65 | 0.18 |
Like Peter, I didn't find any cases where an IBB was issued with a 3 run lead, but Lloyd McClendon did issue one with a 4 run lead in this game.
From here, I looked at out how often teams won and lost when they issued an intentional walk versus when they "pitched away." I will go through one example to explain the data and then simply post the table. In 1485 of the 6094 overall instances, the game was tied. Managers issued an obvious intentional walk 129 times, or 8.69% of the time. In the games where they issued an oIBB, they went 46-83, for a 0.357 Wpct. In the games where they did not issue four straight intentional balls, they went 514-842, for a .379 Wpct. So the difference between the two is 0.022 in favor of not issuing the walk. This translates to 2.9 wins lost over the 129 games where the walks were issued.
Here is the table for all situations that had a decent amount of oIBB's. The "IBB Gain" is the amount of wins the team "gained" by issuing the walk, based on the difference in percentage. Obviously a negative number means wins were lost.
| Tie Game | 514-842 | 46-83 | 0.379 | 0.357 | -0.022 | -2.898 |
| -2 | 117-504 | 16-83 | 0.188 | 0.162 | -0.027 | -2.652 |
| -1 | 260-673 | 31-66 | 0.279 | 0.320 | +0.041 | +3.969 |
| -3 | 48-336 | 7-81 | 0.125 | 0.080 | -0.045 | -4.000 |
| -4 | 13-236 | 2-37 | 0.052 | 0.051 | -0.001 | -0.036 |
| +1 | 284-302 | 20-19 | 0.485 | 0.513 | +0.028 | +1.099 |
| -5 | 6-132 | 0-28 | 0.043 | 0.000 | -0.043 | -1.217 |
I am not sure how to assess the "significance" of these numbers, so I will leave that to someone else, but obviously what sticks out is that number down by a run. When trailing by a run in the early innings and with runners on second and third, one out, and a "heart of the order" guy at the plate, teams did 4 wins better when they intentionally walked that batter. In almost all the other situations, issuing the walk cost the team wins. I am thinking about re-running the numbers with a larger sample to draw better conclusions, but at first I thought I'd let you guys criticise the methodology.
So go ahead! :-)