This is actually the kind of question AI is really good at. I won’t paste the results here but just plugging your entire OP into Google gives a pretty good rundown and analysis and an only slightly squishy answer (7.5%-8.5%)
I asked Gemini the same thing and it came up with double what yours said (16%) after checking a few random news articles and the same Wikipedia page. It wasn’t referring to any authoritative data source.
Why do you believe AIs are good for this sort of factual quantitative analysis?
I believe mine because I asked it to show its work. Below is a link to its answer to me (too long and includes charts and graphics that would not work here). I gave the AI the question as posted in the OP.
C’mon, you guys, this isn’t a matter of just throwing random AIs at it until one happens to guess “more” correctly… Let’s leave the AI spam out of this and see if anyone actually has a factual answer.
(AI side discussion blurred)
Sorry, I wasn’t trying to pick a fight with you in particular, it’s just that AFAIK LLMs don’t do this sort of thing well (yet), and the OP is an interesting question worthy of an actual factual answer, not just AI best-effort hallucinations.
LLMs with RAG and tool-calling could conceivably run this sort of analysis IF they had access to some easily-parsed authoritative database of all the states’ gubernatorial elections, but I don’t know if such a thing exists. The AIs are working with the same limited information sources (in both breadth and accuracy) that anybody else is.
There is no guarantee that “showing its work” is sufficient for a factual question like this because the AI doesn’t know if it has correctly and exhaustively enumerated all the relevant races.
“Its work” in this case consists of some older reports (2-3) from various news and academic sources, the Wikipedia article, two copycat sites showing the same Wikipedia article with different styling, Grokipedia (Musk’s AI-generated encyclopedia), and one reputable-seeming Pew article from 2024, but its author doesn’t herself cite her sources or methodology.
This means that the AI is working with generally outdated sources, often copies-of-copies, with no proper evaluation of their rigor or trustworthiness. There is no guarantee that it has successfully identified all the women candidates (or even candidates and races in general), only what it was able to find in a quick cursor search of articles that were likely gamed for SEO.
I’m not trying to be dick here (sorry if this seems argumentative), but this is a really dangerous use of AI. At best it might be able to give you a lower bounds of the number of women governors it was able to find in a casual web search, but that’s not the same thing as actually exhaustively tabulating all the elections and special races and calculating it from such an authoritative source.
The OP leads with:
Well, the AIs don’t magically have access to one either. Is there such a source? If so, anybody with a spreadsheet could calculate it. If not, I think AIs would be worse at this than a person would be because they don’t even realize the flaws in their methodology.
It showed me a lot more detail than there is at that link. Apparently the enumerated list it gave me and its calculations only appear to me. I did not realize that till I tested it. I asked and it said the best I can do it make an HTML file for others to view which I think is not a thing allowed here (for good reason).
It also showed its sources and methodology as well as an exhaustive list (for me) of everyone it felt met the criteria as well as its math. The answer was approximate answer. I would think it got most everyone and if it missed one would it skew the answer badly? I think this is a very good ballpark answer. If you need an exact answer then I think the person needs to do the legwork which I think would be a very lengthy and tedious task to carefully scrutinize all 50-states’ election records from 2000 till today.
Year
Governor
State
Party
Election type
Weight
2000
Ruth Ann Minner
Delaware
D
Open seat
1.0
2000
Judy Martz
Montana
R
Open seat
1.0
2000
Jeanne Shaheen
New Hampshire
D
Re-elected (3rd term, 2-yr)
0.5
2002
Janet Napolitano
Arizona
D
Open seat
1.0
2002
Jennifer Granholm
Michigan
D
Open seat
1.0
2002
Kathleen Sebelius
Kansas
D
Open seat
1.0
2002
Linda Lingle
Hawaii
R
Open seat
1.0
2003
Kathleen Blanco
Louisiana
D
Open seat
1.0
2004
Christine Gregoire
Washington
D
Open seat
1.0
2004
Ruth Ann Minner
Delaware
D
Re-elected
1.0
2006
M. Jodi Rell
Connecticut
R
Incumbent elected (succeeded Rowland 2004)
1.0
2006
Kathleen Sebelius
Kansas
D
Re-elected
1.0
2006
Janet Napolitano
Arizona
D
Re-elected
1.0
2006
Jennifer Granholm
Michigan
D
Re-elected
1.0
2006
Linda Lingle
Hawaii
R
Re-elected
1.0
2006
Sarah Palin
Alaska
R
Open seat
1.0
2008
Christine Gregoire
Washington
D
Re-elected
1.0
2008
Beverly Perdue
North Carolina
D
Open seat
1.0
2010
Nikki Haley
South Carolina
R
Open seat
1.0
2010
Susana Martinez
New Mexico
R
Open seat
1.0
2010
Mary Fallin
Oklahoma
R
Open seat
1.0
2010
Jan Brewer
Arizona
R
Incumbent elected (succeeded Napolitano 2009)
1.0
2012
Maggie Hassan
New Hampshire
D
Open seat (2-yr term)
0.5
2014
Gina Raimondo
Rhode Island
D
Open seat
1.0
2014
Nikki Haley
South Carolina
R
Re-elected
1.0
2014
Susana Martinez
New Mexico
R
Re-elected
1.0
2014
Mary Fallin
Oklahoma
R
Re-elected
1.0
2014
Maggie Hassan
New Hampshire
D
Re-elected (2-yr term)
0.5
2016
Kate Brown
Oregon
D
Special election (Kitzhaber vacancy; 2-yr remainder)
You mean more than the 20 sources you see (at that link) if you click "Searched the web > "? Were there other, better ones it used behind the scenes and didn’t include in the link?
It’s not that it can’t read and process the sources it does find (it certainly can), it’s more that how would it know it found all the races? i.e., how could it (or you, or any researcher human or machine) tabulate “all the regular and special gubernatorial races in the 50 states since the year 2000”? Is there a public database of this sort? (I don’t see it in the list of sources cited)
And how do you know it’s doing its due diligence on its those sources? (From what I can see, it’s not; many of them are Wikipedia copies or other AI-generated spam but cited with equal authority as the Pew article, for example).
You can’t make that determination. It may be very accurate or it may be wrong, but there is no way for you to tell… all it did was do a few web searches (about 20, it looks like, with varying levels of trustworthiness) and summarize them for you.
Exactly. The danger is that it looks like people are assuming AIs are doing that when they’re not at all. They’re just summarizing a couple dozen or so random articles that happened to come up in a web search, with no consideration at all to their factualness or completeness, with an unstated bias towards special races that (for whatever reason) happened to get more news coverage. This isn’t a good data source.
Then do the detailed work and prove it wrong and unreliable. Easy.
I think for a casual public forum this is more than good enough. If I were publishing a research paper for peer review I would definitely need and expect to do more.
ETA: It also has a “research mode” where it really goes to town on digging out data but it takes 10-20 minutes or more and absolutely clobbers my token usage and I am still at work and need those.
Why? The default assumption shouldn’t be “My AI is right until you prove otherwise”. Research mode finds more sources but doesn’t fundamentally change the issue.
Using that dataset from FiveThirtyEight (who I would trust more than random web articles, and which has an explicit source for every race) and running an AI analysis only on that file (based on the likely gender of the name and/or the LLM’s prior training), they still arrived at roughly 16%, which is the same as what @Whack-a-Mole’s AI estimated at first.
What more would you demand from the average SDMB poster answering these questions?
Before AI, were this question asked, the most someone would be likely to do would be to check the Wiki page, maybe Ballotpedia, do some math and post an answer. It is unreasonable to expect peer reviewed level of precision on a forum like the SDMB. No one will do many hours of work to answer a question like the OP asked (pore through 50-states’ election records since 2000 and meticulously add it up). If someone did not like the posted answer it was up to them to prove that person made some errors.
Now it is AI and it does the same thing and it is entirely unacceptable?
I disagree, I think for the SDMB the AI is doing exactly what has been done here all along.
So yeah, if you think the answer is in error it is up to you to show why. You need not even do all the research, you could say a cited source is in error and point to that error. I once posted an AI answer and it was easy for several posters to point out that it got it wrong without them doing a ton of work (jumped off the page at them). I took my lumps for that one. Same as I would have 10-years ago without using an AI.
It’s not the same thing, though (even if, in this case, its initial estimate was spot-on!)
It’s not that I expect anyone to sit there and manually make a spreadsheet for all the races and all the states. But there is a difference between saying “Here is an article I found” and “Here is the answer”.
It’s not necessarily even an AI vs not-AI thing.
If Elmer_J.Fudd had said “8%” without ever mentioning AI, I’d still have asked, “Wait, how do you know that?” If you had said “16%” without any sources, I’d also have asked where you got the numbers from.
If the list of sources your AI (or a person) provided included some sort of good-enough database (like that FiveThirtyEight one), I’d have simply gone “Oh okay, thanks!”
In this case I was worried because none of the sources it found were particularly good, even though the final answer was still correct. That would apply to a person as much as it would apply to an AI, but I think there’s just more of a general tendency (not with you in particular) to give AIs a pass when it comes to finding & citing trustworthy sources — which I think degrades not just the SDMB but public discourse in general (especially when it comes to something as propaganda and fake-news prone as elections and politics). That’s all.
In this case it was a totally moot point since the outcomes were so similar, but that might not always be the case!
Thanks @Whack-a-Mole. This aligned pretty well with my manual calculations. I got 50.5, but only after going through your list and finding some errors in mine (I missed a couple of 3-termers). The only discrepancy I have is for Kate Brown who won a 2-year special election. (For reference, my data is at the end of this post).
I didn’t calculate total elections held, but GPT told me 316.75 and 50.5 / 316.75 ~= 16% – same as your answer.
Thanks @Reply. That site looks like a goldmine! I found a lot of datasets that were all behind university logins. The most referenced was “Dave Leip Governor County Election Data”.
Here is my manual count from the Wiki link in the OP:
Not normalizing to 4-year terms also results in ~16%:
53 / 333 = ~16%
Thanks everyone for your help and input. I did run my question through ChatGPT, but it kept hedging and estimating even after I pointed it to the Wikipedia page.