(Native Mandarin speaker here)
Anecdotally, I don’t think the tones are that essential to holding everyday conversations, compared to just using the correct characters and grammar. As long as the sentence is long enough, the context usually provides more than enough clues — it’s similar to the way we naturally disambiguate homophones in English.
Many songs, after all, subtly change the character tones to better match the melody, even at the expense of meaning. For example, compare the tones in the Mandarin version of A Whole New World (from Aladdin) to the actual pronunciation of the first phrase, 带你看这世界 (dài nǐ kàn zhè shìjiè, or push the speaker icon to hear it). The musical version tones are quite different & imprecise, and if you stripped away the background melody, it would sound very strange, like a foreigner speaking it. But over the course of an entire phrase, it’s still easy to understand from context.
Context is key here… if you just said one or two random characters with no context and no tones, it would be very difficult to guess what they meant (unless it was something super common like “nǐ hǎo”). But as soon as the sentence gets to like 4-5+ characters, it’s quite easy to figure out what is meant.
I grew up in a school that had a mix of native Mandarin speakers and foreign speakers of various levels of proficiency. Their mastery of tones (or lack thereof) mainly marked them as foreigners (the way an accent would), but did not really hinder communications as much as you’d think.
Just anecdotal experience, but I saw a lot of this in the two decades I lived in a Mandarin-speaking country.
Edit: Here’s some examples
Spoken (with the correct tones)
vs
Sung (with musical melody)
In the original, it’s like dài nǐ kàn zhè shìjiè, whereas in the music it’s like dái ní kān zhe shìjié. The “dai ni” in particular are rising, as is the final character (jié), but still the meaning is clear.