Well I’ve never studied evolution, so I’m not really qualified to argue the latter case. The former, however, is colloquially called the Proto-World hypothesis. Some good linguists have worked on it, but in general historical linguists tend to be dimissive of proto-world as a family. There are a few reasons for it, some more convincing than others.
One big problem is that the main tool we use to figure out what languages are related, and what their mutual ancestors (“proto-languages”) looked like, which is called comparative reconstruction, can only go back about 16,000 years at best, and around 9,000 years at worst.
Essentially (and you can find lots of great info in Lyle Campbell’s Historical Linguistics: an Introduction), the comparative method involves assembling lists of cognates between two or more languages that we suspect are related. Once we have a large amount of data (you can get a good theory with about 50 items, but to be convincing you need a lot more) you begin looking for correspondences.
A correspondence is a phoneme (sound) that is the same in more than one language. For example, let’s say we have four imaginary languages A, B, C, and D, and their words for “dog” are “Kma,” “Gma,” “Lma,” and “Desaba”, I would say that we have a “m” correspondence between languages A, B, and C, since all three have the same phoneme in the same spot. We would use this to propose that if these languages are related, their proto-language had a *m sound. (we use an asterisk to indicate a proto-form, in this case).
Now we look at the initial sound. Language A has “kma” and language B has “gma”. The “g” sound is actually articulated in the exact manner as the “k” sound, they’re both examples of velar stops. (they’re called stops because the sound is made by forming a closure in the mouth to stop outward airflow, waiting for the pressure to build up, then releasing the pressure to create an audible noise. The popping sound of a champagne cork works on the same principal. They’re called velar because the stop is actually created at the velum, which is back behind the palatte on the top of the mouth). They’re both velar stops, and the big difference between them is in voicing, that is, whether or not your vocal folds are vibrating when you release the stop closure. (you can play with voicing on your own: gently place two fingers on your throat, and alternate between saying “ssssss” and “zzzzz”. Notice the vibration in “zzz”? That’s your vocal folds vibrating.)
So far, we’ve assembled three “features” of the initial sounds in kma and gma: both sounds are velar, both sounds are stops, but one sound is devoiced and one sound is voiced. Where do we go from here? Well, if we look at the next sound, the “m”, we’re going to notice that it’s voiced. (Don’t believe me? You have a working three-dimensional model on top of your shoulders: try making it yourself and decide.) A very, very common type of sound change is for voiceless sounds to become voiced when they occur adjacent to a voiced sound, and so I’m going to propose that the first sound was originally “k” in both languages, and that language B developed a “sounds adjacent to voiced sounds assimilate the voicing” rule. (please note that there are a LOT of alternate things that could have happened, and it’s impossible to say for sure unless we have ten or twenty examples of k + m/g + m cognates in both languages.)
As for the vowel, this one is actually easy: there’s an /a/ in languages A, B, and C, so I’m going to go out on a limb and propose that it was a vowel in the proto language.
So now we’ve reconstructed three potential phonemes of our proposed proto-ABCD language: *k, *a, and *m. (Remember that asterisks indicate a proto-form!) The m and a correspondences have been seen in 3/4 languages, and the k/g alternation has been seen in 2 of them. Based on this evidence, I would argue that A, B, and C all had a common progenitor. Language D, on the other hand, is totally different… but we don’t want to discard it out of hand. Rather, we need to look for more cognates to make sure that it doesn’t have any significant amount of sound changes in common. The reason for this is because a lot of things could explain the lack of a cognate here: maybe D borrowed the word for “dog” from a unrelated language that was spoken in the same area, or maybe A, B, and C borrowed a word for "dog’, and D actually has the original proto-word. It’s tough to say from this data, so off we go to find more stuff!
That’s the comparative method in a nutshell. But please don’t ever, ever do it like I just did: you need a huge amount of cognates before you can start saying ANYTHING significant about observable relationships.
And that, as a matter of fact, is why most historical linguists refuse to entertain any current theories about a Proto-World language: language has existed for much long than 9,000 years, and because of this even if every language did descend from a single original language, it’s impossible to tell: languages have been travelling, changing, and inbreeding so much at this point, that it’s generally considered to be impossible for a researcher to find a statistically significant number of correlations, since by this point a lot of correlations will have arisen by chance.
Did that do a fair job at answering your question?