In general, there are not spoken languages that are intrinsically way more difficult than other languages for the native speakers of those languages. There are some aspects of certain languages that children may find harder to get than others; so for example, Japanese and Korean honorifics, which are quite complex, are grasped a little bit later by native Japanese and Korean children than other parts of the language. But in general, all languages are equally easy for infant native speakers. (Written languages are totally different, and nobody will argue with you that it’s way easier to learn to read and write Spanish or Finnish than Chinese or Japanese.)
For an adult second-language learner, of course, it’s all about how different it is from your primary language as wmfellows says above. A speaker of Thai will pick up Lao way faster than she will pick up Portuguese, and vice versa.
I’m skeptical about the claim that tonal languages are inherently more difficult (they are, of course, often very difficult for second-language learners who don’t have a tonal language as their mother tongue.) I am a native speaker of Mandarin and English, and while I totally agree that it is often difficult to understand Mandarin lyrics (as tone is completely disregarded), singing is not a normal form of spoken communication; it’s an art form*. Also, I have trouble understanding a large proportion of English sung lyrics as well, and I imagine that much of Italian opera is similarly challenging to understand. I know less about whispering, but while whispers cannot technically carry tone, studies have shown that speakers compensate in other ways for the lack of tone; and again, I often have trouble understanding whispers in English, too.
I guess what I’m trying to say is that it’s totally legitimate that singing and whispering are difficult to understand, but that these are common problems to all spoken languages and not just tonal languages in particular.
<sidenote>
Even Bigger Nitpick: The Chinese writing system is neither wholly ideographic or pictographic: it is a logographic system. A logogram is a written character that represents a spoken word or morpheme, while an ideograph represents ideas directly (and alphabets theoretically are representations of sound, although of course in languages like English and French there is quite a complex association between the written and spoken word.)
Strictly speaking, it isn’t even completely purely logographic. The word for ‘awkward’ is 尷尬 gān’gà, but this can’t be further divided into two separate morphemes of gān and gà. Contrast 民主 mínzhǔ ‘democracy’, which is made of the morphemes mín ‘people’ and zhǔ ‘center, control.’
</sidenote>
*I wrote my senior thesis on the correlation between tone and pitch in song in Chinese languages. In traditional Cantonese music, tone is often taken into account, either by very minuscule changes in pitch at the beginning of a sung note to mimic the tonal movement of a syllable in spoken language or re-wording the lyrics to match the pitch/tonal movement. In modern Mandarin and Cantopop this is often totally ignored, but I think this still underlines the point that these are art forms, not methods of normal communication. Films like Les Parapluies de Cherbourg aside.