I am conducting a statistical report here at school where I have to look at copious amounts of lists. The lists contain past students names and addresses…(demographical study)
So the lists are arranged alpha by last name and I began to notice some things that simply defy the probability laws of nature.
Look here. All records are names are completely unrelated to each other.
Last name:
Brown…6 xyz road
Brown…6 qwer road
Brown…6 xxxxx road
Brown…6 yyyyy road
How on earth could 4 sometimes 5 separate people with the same last name have the same starting number in their street address?
This has happened with quite a few others with same last names and same starting street number?
I thought this decidedly odd so I began recording the instances, so far out of 5000 names, I have found close to 100 instances of this. How could this be? Close to a hundred people with same last names, same starting street address, completely unrelated to one another? Maybe this should be in GQ…
Do you live in an area with “zoned” street numbers? would number 6, per chance, be the most common number for the first house on the even-numbered side of the street?
Depends on where you are. A lot of places around here will have side streets mirror the main streets they’re parallel to. (Well, as much as two streets in Pittsburgh can be parallel, anyway.) So a street may start at the 4500 block and end at the 5800 block.
I would think the 1s are the downtown area. Depending on the city you are in, maybe that particular city has a high population center 6 block from downtown.
How many non-unique address #s are there in your sample? Let’s say there are 500 duplicated numbers. (6 6’s + 10 12’s + etcetera…)
How many non-unique last names are there in your sample? Let’s say there are 500 duplicated names. (6 Browns + 10 Smiths + etcetera…)
Plug in the actual numbers for your sample.
Given that 10% of the names are duplicated, and so are 10% of the numbers, there should a 1% chance that any given name-number combination is a dupe. Multiply that by the sample size, 5000, and you’re looking at 50 total dupes.
Large sample sizes always lead to odd coincidences.
If it’s not a coincidence, what is it? Did aliens rearrange your list? Is it a vast, right-wing conspiracy? What else could be it other than a coincidence?
[This is a serious question. I’m not just trying to be an ass.]
I’ve noticed since I moved to New England just how many of the roads here are shorter and only have low numbers. (For houses in residential areas, anyway.)
In Houston, my house number was 11306, and the street had a lot more numbers from there. Even short streets, due to planning, had numbers to match the parallel streets, and therefore could have high numbers.
In contrast, here, it seems that every short street starts at 1 again, and goes up by ones, and the house numbers don’t get very high. I rarely see an address over 500.
If this is true for your addresses, it would create a general bias towards low numbers.
Here’s an exercise, if you feel like doing it:
Capture just the house numbers (across all names) in your data set, ignoring street names. Graph the frequency - you’ll probably get some decaying curve, that looks proportional to 1/x. (This, I’ll admit, is a guess.)
Within just one common last name - like “Brown”, the example you use in the OP - do the same. See if the curve matches the first one.
If they match, it’s not much of a coincidence - it’s just a property of the house numbers in the area.
If the data for a specific last name, that has a statistically significant amount of data, doesn’t match the overall population, then you may have something. Possibilities:
An error in your data set (entry typos, for instance)
A family group, throwing off the statistics (especially in smaller samples)
Records deliberately or accidentally falsified
Or, just the far end of the data set - there has to be some name that is farthest from the population, and some that are close, and you may have found the one that’s most off.
Just some thoughts I had. If you do feel like comparing the results for one name to that of the population, we can talk about some statistical tests with the data that indicate whether it’s an anomalous data point or not.
I actually ran the probability curve through a statistics program I have…(JMP to be exact) The findings are obtuse.
First of all the people I was looking at come from all over the globe, not just New England. We have a strong populace from the UK.
I can find no common denominator at all. City size, state, zip, nationality, race, nothing…
But the people of the same last name Brown, Hamilton, Johnson, Smith etc…etc… actually tend towards having the same street number in their address. My wife looked at it and we both figured that out of a sample of 12,000 there were 33 common last names, of that the frequency to have the same street address was above average…
This is venturing into the twilight zone big time! Any more theories on this anomaly are requested