Optimal regular expression generation

Is there an algorithm which takes a set of strings and returns the shortest possible regular expression which generates those strings and only those strings? More generally, is there an algorithm which takes a regular expression and returns the shortest possible equivalent regular expression? If so, what’s its complexity?

For example, in the first case, we might input the strings “Händel”, “Haendel”, and “Handel”, and the algorithm would return the regular expression “H(ä|ae?)ndel” or “H(ae?|ä)ndel”. In the second case, we might input the regular expression “H(a|ae|ä)ndel” and get the same result. (The first example is equivalent to passing “Händel|Haendel|Handel” in the second case.)

A quick search on “optimal regular expression” seems to indicate that the problem is NP-complete. The only cite I can find is a paper I can’t access, though.

Edit: That’s for the general case, though, and only for the formal language definition of regular expressions, which only give you the concatenation, alternation(|) and Kleene Star(*) operations.

Generating an optimal regular expression for a finite set of strings might be an easier problem.

Maybe, maybe not. If you have a large set of strings where it’s not immediately obvious how to group them based on common subsequences, you might have to try all the groupings.