On factoring really big numbers

Turing_Complete · January 26, 2005, 8:52am

I just had some crazy moment of inspiration while driving back to school the other day on how to factor numbers really quickly; I thought about implementing evolutionary computation to converge on a factor, I’ll go into the details if anyone really wants to hear about it.

Does anyone know if there has been any algorithms already invented that use genetic algorithms to factor numbers or anything similar? I’ve been searching papers online on both subjects, but I doubt that Google would be a terribly accurate indicator of the research done on this subject.

Anyway, I wrote a little program in Java that tests this out. Terribly slow, and I think it still may be NP, and way behind GNFS; I haven’t done much work on it because classes started up again, but I have some random trials I did:

Time to factor n-bit number:
n=24: 0.802s
n=28: 4.667s
n=32: 16.734s
n=36: 57.693s

Should I keep working on this or just scrap it?

puggyfish · January 26, 2005, 11:35am

IANA computer scientist, mathematician or geneticist, but I think that to implement a genetic algorithm, you need a non-Boolean definition of the fitness for whatever is the analog of genes in your program.

If your gene analog is a trial factor here, surely either it is a true factor, or it isn’t? This reduces your genetic algorithm to simple computational tests.

The only alternative I can think is if you are taking sets of trial factors as your genes, and have found some way to quickly assess the likelihood that your set contains an actual factor without testing all its elements. If you have, then the idea could have potential…

snailboy · January 26, 2005, 11:37am

Well to answer your last question, I don’t see it as very practical with today’s computing speeds. I’m able to factor a 47 bit number on a program I made a while back in 1.4 seconds. And that’s on an Athlon 600 Mhz. There really aren’t many purposes to factor larger numbers.

joazito · January 26, 2005, 11:43am

Er… actually… http://www.rsasecurity.com/rsalabs/node.asp?id=2189:

joazito · January 26, 2005, 11:45am

That link above should be http://www.rsasecurity.com/rsalabs/node.asp?id=2189

Turing_Complete · January 26, 2005, 12:48pm

I used an alternative representation of integers by expressing them as ab+c where a, b, and c are n/4-bit numbers (which is why all my trial runs are with integers of 4n bits). Genes were a, b, and c successively in base-2. Then wrote the fitness furiction as the fractional part when the number to be factored was divided by ab+c. Array of genes was sorted by fitness and mating selection was done by quadratic selection.

Once the fractional part reaches 0, you have a factor.

I actually don’t know completely why this works. I realized that after writing this, that the genetic algorithm probably really didn’t do anything, because crossovers really wouldn’t have an effect on the fitness at all. However, compared to random search, it does vastly better, probably because there are exponentially more representations of each integer under this. Evolutionary computation probably isn’t the best way to go, and the code is far from optimized; I was just wondering if anyone had done anything like it before.

In addition, I better expand my question: does anyone know how to determine complexity when there is a randomness involved?

carterba · January 26, 2005, 4:37pm

Here’s a paper out of Stanford: Using Genetic Programming to Evolve an Algorithm for Factoring Numbers.

There are complexity classes for decision problems that can be solved with probabilistic Turing machines. PP is probabilistic polynomial time. BPP is bounded probabilistic polynomial time. PP’s not terribly useful (NP is contained in it), but BPP is likely equal to P (contingent on the existence of strong pseudorandom number generators). Papadimitriou has a whole chapter on randomized computation.

Peter Shor discovered a polynomial-time algorithm for factoring on a quantum computer. That puts the factoring problem in the quantum computation class BQP, which is contained in PP and contains BPP. Shor’s algorithm at Wikipedia

Hope that helps…

ultrafilter · January 26, 2005, 7:16pm

You can express the running time/space per generation as a function of the input size. That’s probably as close as you can come to a meaningful notation of complexity (but I haven’t read Papadimitriou, so you might want to check that out).

Here’s another question: how do you show correctness?

pulykamell · January 26, 2005, 7:54pm

Maybe these fabulous prizes will inspire you:

http://www.rsasecurity.com/rsalabs/node.asp?id=2093

Basically, if you can find a relatively quick way to factor the products of two extremely large primes, you’ve undermined the basis of most modern cryptography.

ultrafilter · January 26, 2005, 8:04pm

A laptop running quicksort can beat a Cray running bubblesort when both are sorting a sufficiently long list of numbers. That’s a standard example, too, and one that any reasonably competent programmer should be familiar with.

More processing power isn’t the answer. Better algorithms are.

ultrafilter · January 26, 2005, 8:06pm

Mergesort, not quicksort. I really did know that, but my fingers didn’t.

[sub]Given data that’s not nearly sorted, quicksort is significantly faster than bubblesort, but we’re interested in absolute certainty here.[/sub]

MaxTheVool · January 26, 2005, 8:19pm

Absolute certainty? What if the list is already 100% sorted? Won’t the cray-with-bubblesort then be faster than the laptop-with-mergesort?

(Maybe I’m forgetting how bubblesort works…)

ultrafilter · January 26, 2005, 8:45pm

Both of those algorithms aren’t affected by how close to sorted the original list is. The standard implementation of quicksort (pick a pivot element, any pivot element) is, and runs very slowly on a nearly-sorted list. There are other schemes for picking a pivot element, but most just decrease the probability of a bad one.

I’ve never seen data on quicksort modified to use a linear-time median selection for picking its pivot. It would be interesting to see how that compares to some of the other algorithms out there.

ftg · January 26, 2005, 10:20pm

(Regarding Mergesort and Bubblesort on nearly sorted data. Standard CS textbook versions.) Mergesort will sort in about half its worst-case time on nearly sorted data. Bubble sort approaches (and can reach) linear.
The first linear time median algorithms had constants above 7 IIRC. The most recent algorithm I’ve seen I believe has a constant near 2. All are bears to program and can cause horrible page thrashing on large sets. The random choice is still better on average. But, you don’t have to worry about that quadratic worst-case if you’re the fretful type.

As to the data posted in the OP: The pattern is clearly too small to prove anything, but it definitely seems to be doubling for every bit added. I.e., hardly a breakthrough algorithm in performance terms. But I think such ideas are worthy of further research.

ultrafilter · January 26, 2005, 10:31pm

What textbooks are you using? Every implementation of bubblesort I’ve ever seen will examine O(n[sup]2[/sup]) array entries. Take this typical implementation:


void bubble_sort(long thearray[], int sizeofarray)
{
int i,j;
long temp;

for(i=sizeofarray; i>=0; --i)              // For each spot in the array
    {
                                     //   (starting at the top)
    for(j=0; j<i; j++)                     // Walk through the array to that
        {
                                 //   spot
        if(thearray[j] > thearray[j+1])    // At each location, compare the 
            {
                             //   two adjacent elements
            temp = thearray[j];
           // Swapp elements as needed
            thearray[j] = thearray[j+1];
            thearray[j+1] = temp;
            }
        }
    }
return;
}

That was more a curiosity. I’ve never seen anything on the performance of quicksort with a guaranteed good pivot. Doesn’t sound like it’s all that great, which doesn’t surprise me.

Can’t be worse to code than a linear time in-place merge, though.

Omphaloskeptic · January 26, 2005, 11:32pm

You can add a termination condition to exit the outer loop if you make it through the inner loop without swapping any entries. Here is one simple example (the inner loop needlessly runs through all n entries each time, though; that URL also has the strangest choice of less-than symbol I’ve ever seen).

ultrafilter · January 26, 2005, 11:37pm

Yeah, I know there are modifications–the page I linked to earlier has them. My point is, those aren’t the standards.

btw, that’s not a less-than symbol: it’s an arbitrary relation that’s passed in.

Omphaloskeptic · January 26, 2005, 11:47pm

Ah, I see it now; I didn’t scroll down far enough. I’d still call it a bubblesort, though, and it’s O(n) on a sorted list.

Oh, I know. But it’s an arbitrary total ordering, so it should be antisymmetric. \otimes is far too symmetric a symbol to be used for an antisymmetric relation, to my sense of aesthetics.

ultrafilter · January 27, 2005, 12:34am

I don’t disagree. On the other hand, it might be interesting to think about what happens to various sorting algorithms if < is replaced by an arbitrary and not necessarily antisymmetric relation.

ftg · January 27, 2005, 1:07am

Well, every textbook I’ve taught sorting from has had the early termination condition. Otherwise, what’s the point?
Definitely right there. In-place merge, blyeah.

Topic		Replies	Views
Is prime factorization by simple algorithm impossible? Factual Questions	55	2715	January 23, 2019
Public-Key Cryptography and Large Prime Numbers Factual Questions	32	9376	June 27, 2010
Cryptography question: PGP Factual Questions	35	2066	January 6, 2006
Can you describe my (new, original) sorting algorithm mathematically? Factual Questions	77	8833	September 28, 2009
What does anyone do with huge prime numbers? Factual Questions	24	2705	October 1, 2008

On factoring really big numbers

Related topics