Let’s start by talking about one-particle superpositions, and maybe get to entanglement later.
Really, the particle isn’t in two states at once. It’s in one state, but that state is a combination of two other states. The key is what “basis” you use to express the state.
For instance, suppose you have one basis that consists of the states |0> and |1>. (I’m putting the | and > around them because of a notation called bra-ket notation, but that doesn’t matter for this purpose… just think of it as a way of writing a state.)
So a particle might be in the |0> state, or it might be in the |1> state. But it might also be in the |0> + |1> state, or the |0> - |1> state, or other states of the form a|0> + b|1> for some complex numbers a and b.
(Technical note: I’m omitting the normalization factor for simplicity.)
Now let’s say we have a different basis, say consisting of states |x> and |y>, where |x> = |0> + |1>, and |y> = |0> - |1>.
Is |x> a superposition? Not from the perspective of the {|x>, |y>} basis, but it is from the perspective of the {|0>, |1>} basis.
Here’s the thing: a given measurement will distinguish the states in a give basis (not the same basis for every kind of measurement. So if you measure the state with respect to the {|0>,|1>} basis, you’re going to collapse it to either state |0> or state |1>. If the state (pre-measurement) was |0>, the measurement will produce |0>. But if it was |x> = |0> + |1>, then it has a 50% chance of ending up |0>, and a 50% chance of ending up |1>.
Now keep in mind that |x> is a single state just as much as |0> and |1>. In fact, we could have written |0> as a sum of |x> and |y> if we wanted. But nevertheless, |x> is a superposition in the {|0>,|1>} basis, just as |0> is a superposition in the {|x>,|y>} basis.
Now how do we know that a state is a superposition in the {|0>,|1>} basis, if we can only perform measurements in that basis? You point out that we’ll either get |0> or |1>, just as we’d get if it wasn’t a superposition. But prepare a bunch of identical copies of our original state(*), and perform the measurement on each of them, and you’d see that you got |0> part of the time and |1> part of the time, so the state must be a superposition of |0> and |1>.
(*) There’s a little complexity here, since there’s a theorem called the “no cloning” theorem which says you can’t duplicate an arbitrary unknown state, but in this case I’m assuming we know how the state was prepared in the first place.