C++ Question: Arrays of Arbitrary Dimension?

A program I’m currently working on would be greatly improved if I had access to a “multidimensional array” data structure. That is, a structure holding data in d dimensions (i.e. d = 1 is a vector, d = 2 a matrix, etc.) where d is specified at run-time. The Boost collection of libraries contains a similar data structure, except that d is a template parameter, and therefore must be specified at compile-time. This doesn’t work for me since I want to write generic functions that can take an array of any dimensionality, so I need some sort of parent class. Is this structure available somewhere, and if not, can someone suggest a good way to write it? I have some ideas, but I welcome others’ opinions, and most importantly I want to know if someone’s already done the work and it’s available freely.

This is NOT a C++ specific answer:

Since a multi-dimensional array can be viewed as an array-of-arrays you could define an abstract class Subscriptable with two concrete sub-classes, one of which holds a simple 1-d array and the other an array of Subscriptable. You would then need a helper function to recursively construct your Subscriptable network to arbitrary dimensions.

Can you provide additional detail about the problem you are trying to solve? My initial inclination is that there should be a less complex (and more efficient) way to solve the problem.

Well, when I saw the thread title and read the first sentence or so, I immediately thought about Boost, but I see you’ve already looked at that and need something a little different.

I suppose if switching to a flexible language more suited to that sort of operation (say, Python) isn’t an option, writing your own would be a valid course of action. (Though if you’re fortunate enough to have the freedom to do so, I’d certainly recommend weighing the likely cost of writing, testing, and maintaining a dynamic multidimensional array class vs. the cost of switching to a higher-level language which makes the task trivial…)

I don’t know offhand of anything like you describe, but I did a quick search and found a list of scientific-type mathematics libraries at The Object-Oriented Numerics Page; maybe one of them has something like what you’re looking for?

If you wind up needing to build your own, the best approach will likely depend on exactly the way in which you need to use it. Do you foresee needing to resize or redimension arrays a lot, will they be very large, will performance be a critical factor, etc. For simple purposes, though, maybe something along the lines of a composition of “dimension” objects, each dimension containing either an array of further dimensions or a piece of data?

It’s 2:00AM and I was feeling bored, so I went ahead and slapped together a quick and dirty outline. I’ve grievously abused (;)) operator=, operator, and a conversion operator to allow the array to have the traditional subscripting syntax, e.g., myarray[4][2][16] = 42.

I’ve used boost::shared_ptr to handle cleaning up all the dimensions.

To create a multidimensional array, you construct an initial “dimension”, passing to the constructor an std::vector of integers specifying the dimensions. When the dimension is constructed, it pulls the first integer out of the vector, creates that many subdimensions, and constructs each one with the vector less the first element.


#include <boost/shared_ptr.hpp>
#include <vector>
#include <exception>
// Handy typedef for smart pointers to dimensions
class dimension;
typedef boost::shared_ptr<dimension> dim_ptr;
// Exceptions used by the class

class incomplete_subscript : public std::exception
{
    virtual const char* what() const {
        return "attempted to access incomplete multi-array subscript";
    }
};

class out_of_bounds : public std::exception
{
    virtual const char* what() const {
        return "multi-array access out of bounds";
    }
};
// Multidimensional array dimension class

class dimension {
private:
    std::vector< dim_ptr > subdimensions;
    int data;

public:
    dimension(std::vector<int> dims) : data(0)
    {
        if(dims.size() > 0) {
            int size = dims.front();
            dims.erase(dims.begin());
			
            for(int i = 0; i < size; ++i)
                subdimensions.push_back( dim_ptr( new dimension(dims) ) );
        }
    }

    int length() const {
        return subdimensions.size();
    }

    operator int () const {
        // Only convert to int if this is a "leaf" dimension
        if(subdimensions.size() != 0)
            throw incomplete_subscript();

        return data;
    }

    void operator=(int val) {
        // Likewise, only allow assignment to the leaves
        if(subdimensions.size() != 0)
            throw incomplete_subscript();

        data = val;
    }

    dimension& operator[](unsigned int index) const {
        if(index >= subdimensions.size())
            throw out_of_bounds();

        return *subdimensions[index];
    }
};


And a quick example of its use:



#include <iostream>
#include <vector>
#include <exception>
using namespace std;

int main()
{
    try
    {
        // Create a 4x4 array
        vector<int> dims;
        dims.push_back(4);
        dims.push_back(4);
        dimension array(dims);

        // Change a few of its values
        array[0][3] = 5;
        array[3][0] = 9;
        array[1][2] = 42;

        // Print it (note: array[0].length works to get the size of the second
        // dimension since all sibling subdimensions have the same length)
        for(int i = 0; i < array.length(); ++i) {
            for(int j = 0; j < array[0].length(); ++j)
                cout << array*[j] << " ";
            cout << endl;
        }
    }
    catch(exception& e)
    {
        cout << "Exception: " << e.what() << endl;
    }
    return 0;
}


To complete the picture, you’d probably want to wrap all this inside another class, so you can add some more useful methods and do things like prevent the user from creating a zero-dimensional array. Oh, and you might also want to make it a template so you can use it for more than just integers.

Do note that this is a very rough, simplistic approach that’s probably acceptable for simple purposes, but not for anything serious. Just one problem off the top of my head is that this is quite possibly the least cache-friendly array imaginable, since it consists of a separate heap allocation for every single element; any spatial locality you get would be purely accidental. :o

So it would be much better right off the bat to refactor the “data” up a level so that you’re storing an std::vector of your data type instead of single pieces. That will get you much better performance as long as you tend to traverse your arrays along the contiguous rows. Oh well, I slapped this together in twenty minutes or so, so you can’t expect too much. :wink:

On preview, this is more or less what ticker suggested. :stuck_out_tongue:

ETA: Oh sure, Algorithm, try to find a more effective way to solve the underlying problem instead of immediately constructing a bulky solution to the superficial problem, why don’t you. :smiley:

If you care about speed, nothing is going to work. C++ arrays have to have all of their details specified at compile time because they’re designed so that accessing an element by index is two CPU instructions. Any more flexible solution will be considerably slower.

Stealth Potato: gcc refuses to accept your code. In specific, your two exceptions should be thus:



class incomplete_subscript : public std::exception
{
  virtual const char *what() const throw () {
    return "attempted to access incomplete multi-array subscript";
  }
};

class out_of_bounds : public std::exception
{
  virtual const char *what() const throw () {
    return "multi-array access out of bounds";
  }
};


Other than that it works fine.

Meet malloc(). He is your friend.

The below code (untested, so probably containing some errors and math that’s wrong) currently is based on a char buffer, but you’ll probably want to change it to a template so it can hold any type.




class MArray {
private:
    int dim;
    int* len;
    char* buffer;

public:
    MArray(int dimensions, int* lengths) {
        dim = dimensions;
        len = malloc(dimensions * sizeof(int));
        memcpy(len, lengths, dimensions * sizeof(int));

        int size = 1;
        for (int i = 0; i < dimensions; i++) {
            size *= len*;
        }
        buffer = malloc(size);
    }

    void set(int* indexes, char value) {
        int index = indexes[dim - 1];
        for (int i = 0; i < (dim - 1); i++) {
            index += indexes* * len*;
        }
        buffer[index] = value;
    }

    char get(int* indexes) {
        int index = indexes[dim - 1];
        for (int i = 0; i < (dim - 1); i++) {
            index += indexes* * len*;
        }
        return buffer[index];
    }
}



And of course you’ll want to add a destructor.

Actually I’m pretty sure that my indexing math is wrong. But, that’s just a matter of sitting down with some paper and pencil for a bit to figure out. It should be approximately the same lines of code, whatever it is.

Ah, whoops, I forgot the exception-specifiers. That’s what I get for testing this with Microsoft’s C++ compiler, which even in the 2008 version is still remarkably lax in some areas on complying with the standard… :smack:

Sage Rat: That’s a good approach too, but why use malloc() for that? Surely it would work just as well to allocate just one array with new (or better still, an std::vector) if you’re going to map indexes onto a flat array, and it would avoid the C-isms and (in the case of the vector) the use of naked pointers.

I’m a C guy. I like objects, but so I pretty much use C++ as C with objects. It leads to the same code in the end, but with fewer magical symbols and keywords like :: and namespace and whatnot, which (to me) improves readability. Using new and delete on an array seems perfectly fine, it just seems odd to me to use them on a built in data type or to create a data buffer. (And of course I never have problems with moving between compilers. :wink: )

I wouldn’t go for a vector though. You can’t arbitrarily resize a buffer containing multiple arrays. Similar to resizing a hash array, you have to create a second buffer and move everything over item-by-item. So you may as well just go with a plain old array.

Yes, but the crazy sort of friend who will borrow your car and drive it off a bridge accidentally, and hit on your girlfriend when he thinks you’re not looking. Use of malloc() not only greatly multiplies the potential for a bad bug in your program, but it also greatly multiplies the potential damage such a bug can do. Which is not to say not to use it, but just to be very careful about it when you do.

Unless you’re programming in something with a garbage collector, it’s just the same as new (except that it doesn’t call a constructor–for which there isn’t one in the case of an array.) You still have to do the math to determine the number of elements, you still have to make sure you access within the bounds of the array, you still have to delete or free() manually, etc. I suppose there’s a minor issue with it returning a void*, but I wouldn’t call this a particular onus.

A vector would indeed be safer. But given that the buffer is being managed internally to the class, it’s not like anyone else is going to be accessing it. I’ve not had any pointer introduced errors in any size of code I’ve worked on in the last ten years, so personally I feel fine to hand someone an object I’ve made that uses pointers internally. They don’t have to touch it.

I didn’t mean to disparage your programming skills: If you tell me that you’re proficient in the proper use of malloc(), then I’ll (provisionally) believe you. But “malloc() is your friend” sounds like an endorsement for others who aren’t as conversant to use it also, in which case those less-experienced folks should be warned that you can get in deep trouble with malloc() if you’re not careful about it.

And I can’t comment on the use of new… I’m guessing that’s a C++ thing? I only know non-incremented C. But if it does a similar job to malloc(), then I’m not at all surprised to learn that it has similar hazards associated with it.

There is if it’s an array of objects.

When you say you want to write generic functions that can take an array of arbitrary dimensionality, is there a reason that you don’t want to write templated functions?

Yes, new is C++'s heap allocation operator; its corresponding deallocation operator is delete. It’s quite different from C’s malloc() in that it takes into account the type of the object being allocated, calling constructors as appropriate – if you allocate an array of objects, it calls the default constructor for each object; otherwise, it can call a specified constructor of a single object. Similarly, delete calls the destructor for the object deallocated; delete [] must be used to deallocate an array, in which case it also calls the destructor for each object in the array. Both operators can also be overloaded in the global scope, or for specific classes, to provide custom allocation behavior.

All things considered, I’d say it’s much better to do things the C++ way consistently and use new, though if you’re really comfortable with malloc() and prefer to use that to allocate arrays of non-class types, that will work as long as you’re very careful and know what you’re doing. I suppose it’s what makes sense if you really want to treat C++ as “C with objects,” though I still can’t recommend that approach. :stuck_out_tongue:

Naturally, new and delete aren’t free from hazards, of course. A common one is mixing up array allocation with the scalar delete operator – this and the need to remember to use delete in the first place make it very advantageous to embrace the C++ way of doing things completely and use appropriate RAII containers instead of juggling and deleting naked pointers.

The templates are still resolved at compile time, so if he needs a single function that can take an arbitrarily-dimensioned array at runtime, templates won’t really help.

Sorry for being unclear. I want to just check that the reason he says he wants arbitrary dimensionality is because he wants to write generic functions, or if there’s some deeper reason he feels he needs to delay the binding until runtime.

Well I really wasn’t aware that it wasn’t used. I thought C++ people still used malloc() and realloc() for working directly with the heap.

But yeah, “new char[10]” is just fine as far as functionality goes.

No, definitely not. :stuck_out_tongue: C++ people even avoid directly using new and especially delete as much as possible by using RAII and related patterns. Note that there’s only one use of the “new” operator in the code I posted, and precisely zero use of “delete”; the deallocation is handled automatically by boost::shared_ptr.

The resulting code is often more semantically clear as well, since the choice of appropriate containers (scoped_ptr, shared_ptr, weak_ptr, scoped_array, etc.) signals (and often enforces) the programmer’s intent about the use of the underlying pointers.

Wimps!

humor, obviously