Java memory handling-- arrays of arrays of...

Well, getting into programming Java and I need some information which I thought would be located in my nifty Java source book. But it isn’t.

I took it upon myself, as a test of speed and memory, to find all the prime numbers between 0 and some number one passes along as a command line argument.

I was getting OutOfMemoryExceptions, so I then wrote a quick test program to find out the largest array I could make.

Now, the array I chose was a boolean array because I figured it would be the one which used the least amount of memory, though I suspect it is the same as byte. I would LOVE bit! :stuck_out_tongue:

At any rate, my program is now bound by a ceiling of 50 million (which it find all primes in in 54 seconds, so long as it doesn’t display progress, which I’ve commented out now that the thing works). Though 50 million is a fine ceiling for finding primes, something around 650 thousand of them, I’m not satisfied. (and the ceiling is 50 Million exactly, 50000001 is the magic error number)

So, the plan is going to be to be able to pass a long integer to it as a ceiling. Not going to bother with double until I get long to work. At any rate, I am about to make an array of arrays (or just multiple arrays) by dividing the ceiling by 50 million to see just how many arrays I need (ceil/50mil + 1 arrays). Already I was wondering… how many arrays of arrays can I have? Or better yet, how many arrays can I have period (whichever is greater)? I can’t seem to find any limits on this, and setting programs up to create huge arrays, destroy them, then create huge arrays +1 takes a looooong time. If I have to I’ll do it, but if anyone can help here I’d appreciate it.

Anyone interested in the algorithm, which is pretty fast as it does no division on any numbers, is welcome to ask. I could email the code or post it.

Oh my, second question to this. In “for” loops I couldn’t find limits on the conditional statement, italicized here:
for (int i = 0; i < ceiling; i++) {
//code
}
Could I put a double in there? I’ve put “ceiling” numbers in the loop which are larger than type int and received no compiler errors, but I’ve never actually tried using anything but an int for these loops. I know the “i++” can’t be any other statement… at least, I tried “i + 2” and it wouldn’t compile…

Wow, long GQ. Maybe I should summarazie:
[li]Anyone know how many arrays I can make in one method? Note: not a main method, if that matters.[/li][li]Do for loops need int types? (this one is just laziness on my part, so feel free to not answer)[/li]
In closing, I suppose with more intensive code I could get by with only making one array at a time, but I am pretty shakey on extensively nested loops and would happily avoid it if I could.

Thanks for help.

forgot to check “email notify” box… sorry for the bump.

What are you doing - running a sieve with a very large array I suppose? I’m going to assume that anyway.

On the specific for loop question - if you haven’t coded in java or c/c++, you should get used to the idea that a for loop is really just a shorthand that isn’t intrinsically tied to any particular type of iteration or testing at all.


for(s1; expr; s2) {
	stuff;
}

is functionally equivalent to


s1;
while (expr) {
	stuff;
	s2;
}

with minor proviso’s like a “continue” statement in the “for” version “stuff” will execute s2, while it will not in the while loop shown. You can make a slightly more complex while construct to preserve it, but it would obscure the point, which is that there isn’t any restriction on the nature of the expressions or statements involved, and that the “for” in java or c is simply a shorthand.

As for how large an array you can construct, it will probably be environmentally determined, and you would be better served by rethinking how you are approaching the problem.

If you want to write an arbitrarily large sieve (or at least to the native integer size), off the top of my head, I would:

1 - make it an array of byte rather than boolean, and use an inner loop to mask to the appropriate bit of the byte, thereby reducing your size by eight if a boolean really is internally coded as a byte.

2 - serialize the current results out to a tempfile, and treat the sieve array in large chunks, adding to the current results list at the end of each chunk. You would read the tempfile in large chunks at the beginning of each sieve array chunk as well, since you probably could not hold the entire result list in memory either.

Actually, using a “test” function to actively test for primes is terribly slow at finding them, as my first few tries at writing proved, so I don’t do that. Instead I just eliminate non-primes… so yeah, maybe you could call it a seive. I can’t resist, here’s the meat-and-potatoes of the method:


reference = 3;
        while (reference < Math.sqrt(ceiling)) {
            for (int i = 3; i <= ceiling; i++) {
                if (primes* != false) {
                    numPrimes++;
                    if ((numPrimes % progressCheck) == 0) {
                        lineDisplay("Found the " + numPrimes + "th prime!");
                        }
                    reference = i;
                    int stopper = (int)(ceiling/reference);
                    for (int t = 2; t <= stopper; t++) {
                        primes[(t * reference)] = false;
                        }
                    }
                }
            }

Sorry about any scrolling…

What is incredible is watching it in action… the first “few” primes are slow to catch, and it speeds up very quickly (I’d like to say exponentially but I’d need to do some number theory to show that) the longer it runs.

The array can be anything at all, it only needs to be able to store a binary value which is why I chose boolean for true/false for prime/not prime. Then the index value of the array actually corresponds to the number.

Haven’t done any bitmasking, but I suppose that would be possible to do for the way the algorithm works. Hmmm, that would effectively multiply the ceiling to 400 million if I could get that going…

I have also considered utilizing a temporary file for the operation, but I would like that to be a last resort as well.

As far as the loop goes: (smacks forehead). Thanks. :slight_smile:


reference = 3;
        while (reference < Math.sqrt(ceiling)) {
            for (int i = 3; i <= ceiling; i++) {
                if (primes[****i] != false) {
                    numPrimes++;
                    if ((numPrimes % progressCheck) == 0) {
                        lineDisplay("Found the " + numPrimes + "th prime!");
                        }
                    reference = i;
                    int stopper = (int)(ceiling/reference);
                    for (int t = 2; t <= stopper; t++) {
                        primes[(t * reference)] = false;
                        }
                    }
                }
            }

I think you’ll find that “i += 2” will compile okay, as will “i = i+2”.

What you are doing is called the “Sieve of Erasthones”, which may be the oldest numerical method in existance.

If you thought of it independently, good for you.

Wow, I’ve actually never heard of it. :pats self on back: hehe. Time to go hunt this thing down.

This is probably, however, a foreshadowing of my programming life to come. Reinventing the wheel :slight_smile:

You can tighten it up a bit, too, though. Hint - you’ve got a multiplication and a variable in there that you don’t need. Think about the increment you’re using on your inner loop.

I can help with this. What you’re trying to do there is a linear search, but you can do much better with a binary search. You start with some size that you know won’t work-- Let’s say that you know that a million-element array is too big. Then you try a 500 000 element array. If it works, then you know the limit is somewhere between 500 000 and 1 000 000 , so you try 750 000 . If 500 000 doesn’t work, then you try 250 000 , etc.

It’s like looking up a name in a phone book. You wouldn’t check the first page, then the second page, then the third page, and so on. You’d check the middle page, and see what half of the book it’s in, then check the middle page of that half, and so on.

One other note to add to the efficiency, by the way:


while (reference < Math.sqrt(ceiling)) {

Never use a square root in a comparison-- It’s much quicker to square the other side, so you’d have


while (reference*reference < ceiling) {

(I don’t know if there’s a compact notation for exponentiation in Java)

The outer while is something that he really doesn’t need at all, and will execute once - I thought I’d let it ride (actually, it creates a problem when ceiling < 16, since the sieve then won’t execute, but I passed on that, too). He should also mention that he has presumably initialized his primes array with true values before starting this code snippet.

Turning the inner


for (t=2; t < stopper; ++t) {
	primes[t*reference] = false;

into something like


for (t=2*reference; t < ceiling; t += reference) {
	primes[t] = false;

will save a lot of operations.

Since ceiling is a constant over the loop, you should just calculate the square root outside the loop, which would be even more efficient than Chronos’s fix.

And some people would write “!= false” in a slightly more direct fashion.

Unfortunately I’ve passed on modifying the primes program this evening and went to practicing promting for keyboard input.

The loop executes fine when ceiling < 16. It wouldn’t work for anything less than 10, which I just tried. sigh

There is a compact method for exponentials, but doesn’t really save keystrokes for powers of two.

Now, yabob, I made the change you mentioned but no time was saved :frowning: I then made Chronos’s changes and no time was saved.

I think I’ve hit the ceiling speed on this sucker as it stands.

smackfu, yeah, i could just put while (primes*) but no big deal. The program didn’t start out with a boolean array, and I edited code as I changed the data types without thinking too hard. :sheepish grin:

Forgot to mention, precalculating “Math.sqrt(ceiling)” didn’t save any time either. :frowning: :frowning:

Ain’t going to be breaking any land speed records here. If there is no way to trim down the algorithm any more, we’ve got one marker for how much slower Java is compared to machine code: this program would run in under 8 seconds.

Duh. I read <=, not <. Sorry. Point was, there was a boundary condition to worry about. Anyway, as I noted to Chronos, you don’t need the outside while loop in the first place - it’s going to execute once, for sufficiently large ceiling values - reference will exit the for loop being the last prime < ceiling and be > sqrt(ceiling)[sup]1[/sup]. In fact, if the while loop executed twice you would be counting the same primes over again.

Our various “speed ups” are really in the noise level, which isn’t that surprising.

BTW, I also assume you initialize the count to 1, since you need to count 2 as a prime. If you are starting with it as 0, start from 2, not 3.

[sup]1[/sup] - no, I’m not going to prove that there is always a prime > sqrt(n) and < n for sufficiently large n. Somebody else may do that, if so inclined.

Oh, something else I just noticed - your outer for loop says <= ceiling, not < ceiling. Your array length had better be at least ceiling + 1 then, or you will get an out of bounds (java/c is zero based indexing). In java/c, it’s more customary to do loops like that with the actual array length and use <. And since it’s <=, amend my last comment to say you will exit the for loop with the largest prime <= ceiling.

The array length is indeed ceiling+1, as it has to be to impliment it the way I do where the array’s index value directly corresponds to the number in question.

Tonight I will remove the out while loop and see if that changes anything.

Are you suggesting that I am missing primes this way? I’m not prepared to argue that I’m not except by a heuristic argument… heh.

I don’t doubt that there is a prime in the boundries you mention, most likely for all n > 1. However, any numbers in its neighborhood which are not prime will have divisors less than or equal their square root, and so there is no need to ever check above sqrt(ceiling) to find primes less than ceiling. Because I’m not sure how the casting operation works (does it round normally or is it a greatest integer function) one could, just to be safe, check sqrt(ceiling) + 1. :smiley:

You know, I can’t believe my Number Theory book has no details whatsoever on this intriguing “seive.” I’ve been reading up on it and a comparable method on the web since you put a name to it. Interesting stuff.

erislover wrote:

Which is, actually, what I see as a big waste of space. More than half of those booleans will never get changed from their initial settings, since your inner loop is ‘smart’ enough to skip the even multiples of the primes. Changing the code so that (index2)+3 = “real value” would allow you to find primes up to ceiling2 and a few. Thusly:


int limit = sqrt(ceiling);
for (i=0;i<=limit;i++)
{
   if (primes[ i ])
   {
      int primeValue = (i*2)+3;
      System.out.println("Found prime "+primeValue);
      for (int j = (i+primeValue) ; j<ceiling ; j+=primeValue)
         primes[j] = false;
   }
}

Of course, the above leaves out setting all the values in primes to true to begin with. If I remember correctly, Java ensures that arrays, when created, are initialized to all-zeroes. So change the name of the array to notPrime and reverse all the explicit and implicit trues and falses to save time on initialization.

Speaking of time, you wrote:

It’s that inner loop that’s doing this. When you get the first prime, 3, that inner loop goes and sets 8,666,666 members of the primes array to false. On the second prime, 5, it sets 5,000,000. Etc. Each time the seive finds a prime, that inner loop sets ceiling/(2*i) of the array members. When the prime gets to just 101, you only set about a quarter-million of 'em.

Is that ‘exponential’? I dunno, but I do know that if you implement the above suggestion about not keeping track of even numbers, that inner loop will be executed twice as often.

And as for yabob’s comment:

forget about it. I’m suggesting that you cheat, and specifically print out ‘2’ and set your counter of primes to the apporpiate value before ever entering the main loop. 2 is a special case, everyone knows it’s a special case, don’t make a concerted effort to go ‘find’ it, fercryingoutloud.

Something else that occured to me: since you know what the ‘real values’ of the array are (no matter whether you drop the even numbers or not), and since it’s a fact that keeping track in that boolean array of numbers you’ve already decided are primes is a waste of memory (you never need to go backwards in the array), you can continue the seive fairly easily. Once the seive is complete for ceiling numbers (or ceiling2 in my example, above), run through the array, and store each real prime in a list-of-primes array (ints, perhaps). Reset the primes array, but then run through the list-of-primes array, and run the “inner loop” on them, keeping in mind that the starting index of the primes array represents the number at (ceiling2)+3. This shouldn’t take much code.

You can then run the seive again, but you’ll need to work the arithmetic with your “base number” when figuring out what primeValue is. You can repeat this process almost indefinitely (until the primes get farther apart than your array holds - not likely to be very soon, given the most-common distance between two primes under 32 bits in length [4 billion or so] is 6 [and yes, I once wrote a program to calculate all primes 32 bits and under, and compare the distances between neighboring primes - I’ve got a nice histogram to show for it]).

Ugh:


int limit = sqrt(ceiling);

should be


int limit = (sqrt((ceiling*2)+3)-3)/2;

of course (or something close, it’s getting late).

I have eliminated the outer while loop; no new speed gained.

Dave that is really a great suggestion to double the number of primes I can find. I already do cheat and set 2 to be the first prime (actually, it is a boolean array that “assumes” all numbers are prime unless seived out, so I really set 0 and 1 to be non-prime).

Damnit, I WILL speed this program up. Additional note, creating and initializing the array with true values takes almost no time at all… maybe 3 seconds total. No, the algorithm itself is the time-eater, and from the display portion of the program that I’ve edited out here and commented out in the program shows that it is really the first 10000 or 100000 primes that take the most time.

For interested parties in my progress I’ve modified the program to prompt for a ceiling instead of passing it in the command line. Getting better! It does it by a crappy System.in.read(byte) method, though, since I haven’t gotten the hang of other reader methods like StreamTokenizer, but having a character array does help me do whatever I want with the prompt’s input…