Java Regular Expression Help

Although I’m rather new to the Java Regex scene I think I have a pretty good grasp on the concepts. That being said I have a problem. What I’m trying to do is search through an htm file and find a string that looks like either of the following:

VALUE=“this is the string i want”
NAME=“or this is what i want”

Basically the goal is to extract the string within the quotes if a VALUE attribute exists OR a NAME attribute if the VALUE does not exist. So I can easily do the following:

Pattern style = Pattern.compile(“VALUE=”([^"]*)"");

Check to see if I found a match, and if not recompile with the NAME attribute instead and process all over again. I’m relatively sure I can combine these into something that looks like this:

Pattern style = Pattern.compile(“VALUE=”([^"])" | NAME="([^"])"");

This way I can check either OR and group the string of whichever one it finds. The problem is it doesn’t seem to work and won’t find a match.

My question is, is this correct syntax or am I just not able to match and group something like this? If you need any other information just let me know. Thanks in advance!

That’s not exactly right, because the " character has meaning in a regular expression, so you need to escape it(Java Regular Expressions are quite annoying for things like this). You want:

String regexp = “(VALUE|NAME)=\\”([^"]*)\\"";
Pattern.compile(regexp);
// and so on

Wait, I messed that up. It should be:

String regexp = “(VALUE|NAME)=\”([^"]*)\"";

Thanks Rysto!

I was looking at that earlier and figured I was trying to OR too much. This ended up working for me:

Pattern.compile(“VALUE|NAME=”([^"]*)"");

Kind of…

This is what I’m looking at in my html file:

<PARAM NAME=“string-name” VALUE=“string-value”>

If a VALUE exists I’d prefer that over the NAME but apparently the regex doesnt check the expressions in order. Is there another way for me to do this or am I stuck compiling and checking for each one in order to check in the right order?

Now that I think about it that makes perfect sense. I have no idea why I thought it wouldn’t check it like it is. :smack:

You want to use the repeating search capability of Matcher.find()

Use Matcher.find(). Use the pattern (VALUE|NAME)="(^"*)" if the search succeeds and Matcher.group(1) is VALUE then you’re done, just get Matcher.group(2). If Matcher.group(1) is NAME, save Matcher.group(2) in a variable and call Matcher.find() again. If you get a match, check that Matcher.group(1) is VALUE and if it is, use Matcher.group(2) instead of the variable you saved.

Make sense?