Hi everyone. I dithered between GQ and IMHO for this. I’m hoping there’s a somewhat definitive answer, or at least a consensus.
I’m slowly teaching myself to program, using a mix of books, online courses and the like. My language of choice is Python 3, and I’m at the point where I’m starting to create useful programs rather than ‘Hello World’ examples. I’ve no intention of programming as a career - this is just a hobby for me. The goal is to be able to create dynamic websites in Django, and I’m getting there slowly.
The online course I’m working through has got to Object Oriented Programming. I’ve worked through some of it, and read up elsewhere, and I kind of get it. I understand the basic mechanisms to create classes and sub-classes, and I understand what’s going on when I do. What I’m struggling with is why (or if) this is actually better for my personal use-case.
I can see that if I was one of a team, all working on the same codebase, then having each class as a ‘black box’ which can be worked on in isolation would be a big advantage. In my case though, it’s just me. One of the benefits of OOP I read about is that it reduces duplication, but so far I have achieved that by putting my code in functions and calling them when I need them. To be fair though, I haven’t really created anything yet that is larger than a single module (i.e. one file).
All of this isn’t to say I’m not going to learn OOP - Python is very object-oriented, and the libraries and sample code I come across are written in OOP, so understanding it would be a big advantage. My question is more whether or not I should be using it in my own programs, and if so why.
You shouldn’t go crazy to use it, no. With that said, python offers very minimal effort declaration of classes and then you can access that class in your code.
Even if you aren’t reusing, certain tasks end up looking like this :
initialize some variables
Do some calculations using those variables + input data
Return output data on demand
So for example, do you want to stuff data into a buffer? Some cases, you want to declare a buffer class. On init, it gets the buffer ready. You then have a member function (one tab over, under the class declaration in Python - very minimal effort) called “push” that sticks something into the buffer. And a member function called “pop” that gets the first element back out of the buffer.
Even if you’re only going to use one buffer in your program (they call this a singleton class), organizing it this way is much easier and neater to keep track of in your mind.
But yes, there are a lot of techniques that do not make sense if it’s just you, working on a small program. Very basic and straightforward use of classes in Python is not one of those.
You can think of much coding as being either data centric or algorithm centric. If you have lots of versions of the same data thingy it becomes natural to make your code data centric, and wrap the code structure around these data thingies - and you form abstractions that allow you to manipulate these thingies without having to worry about the intricacies of what those thingies are. Most programming languages allow you to use those abstractions to form the basis of new, more complex, thingies without needing to go back to square one. You can wrap a few different kinds of thingy together to make something much more useful, and not need to care about how the parts work. This ability to abstract over the representations and implementations makes life much easier.
As you may guess, thingies become “objects”, and a thingy centric view of programming becomes an object oriented one. The point being that the data object is the notion that forms the centre of your view on how programs are structured. You create objects as needed to represent the concrete things your program needs to operate on, and those objects encapsulate the manner they work.
There are times where this isn’t really the best way. If you have a paradigm where the basic data types are very well developed and the core thing you are doing is all about the algorithms that manipulate that data, you have a procedure centric view of programming. Much mathematical programming is of that ilk. Arrays of floating point numbers are pretty basic and you manipulate them with well understood mathematical operations. You don’t tend to have a dynamic content of these arrays, just the few needed to capture the mathematics, and you manipulate them with functions.
Using Python and Django I can’t see how you will survive for long without going down the OO route. Indeed you can’t really get past “Hello World” in Django without understanding how it encapsulates database queries and how it exploits the OO paradigm to do so.
Even when coding something at home for a trivial purpose I’ll start with a basic OO structure. Inevitably things grow, and the encapsulation of the initial data representations becomes a time saver. Reuse becomes more important as you write more code. An OO based bit of code can be used in a new bit of code you are hacking together easily. A bunch of functions won’t cut it.
I’ll invariably start a new Python class by defining the constructor, a str() function and a basic start to a set of unit tests. Minimally I can print by object in a readable form suited to the task, and I can start adding sanity tests.
Perhaps the main comment is that starting with a disciplined approach to coding means you will be able to construct powerful and complicated systems much faster than otherwise. The break point where not having encapsulated your data abstractions early starts to bite and slows you down is much earlier than most people realise. BUT - don’t go overboard. The worst code is where people build individual stupid trivial abstractions over things that don’t deserve them, and the code becomes a morass of locally defined trivial types.
A central maxim is DRY - don’t repeat yourself. But it takes experience to know where the boundaries go.
I find that it is most useful in testing. You can isolate problems to one area of code and even create objects that are dedicated to testing so you can test them without even running the rest of the code. In this example below (written like C# but not exactly) you could even test the AddAllColumns object even if your database isn’t created yet or you don’t have access to it. While that may be more of a professional problem than amateur it is still useful to be able to make sure that your black box works even if you own all the code. In my professional life I have done exactly that: written an app to extract rows from a database and dump them into a file, and was given only an example output file and no database at first. Then I was able to just write the database access object once I had the database and I knew my extractor methods would work.
Class RowFromADatabase
{
Int RowA,
Int RowB,
Int RowC
}
//Both the real database access and the mock database access will
//be implementing this, meaning they both have to have this method.
//Other methods can use ANY IDatabaseAccess as a parameter and then
//be assured that it will have GrabARowFromADatabase in it.
//in languages that don't have interfaces you could probably do this
//with inheritance
Interface IDatabaseAccess
{
RowFromADatabase GrabARowFromADatabase();
}
Class RealDatabaseAccess implements IDatabaseAccess
{
//This class would contain whatever actual database methods you'll be calling
var ActualDatabaseAccessObject;
RowFromADatabase GrabARowFromADatabase()
{
//your real access method might not be called GetRow, obviously
var nextRowFromTheDatabase = ActualDatabaseAccessObject.GetRow();
Return nextRowFromTheDatabase;
}
}
//This also returns a row, but it is a hardcoded one which
//doesn't depend on having database connectivity and will
//also not change if the underlying data has changed.
//Useful for testing purposes
Class MockDatabaseAccess implements IDatabaseAccess
{
RowFromADatabase GrabARowFromADatabase()
{
RowFromADatabase rowToReturn;
rowToReturn.RowA = 5;
rowToReturn.RowB = 4;
rowToReturn.RowC = 3;
Return rowToReturn;
}
}
Class AddAllColumns
{
int AddAllColumnsFromDatabaseRow(IDatabaseAccess rowGrabber)
{
RowFromADatabase rowToAdd = rowGrabber.GrabARowFromADatabase;
return rowToAdd.RowA + rowtoAdd.RowB + rowToAdd.RowC;
}
}
//tests the AddAllColumns object, independently of how you get the data
Class TestColumnAdder
{
bool CanItAddCorrectly()
{
MockDatabaseAccess testerAdder;
AddAllColumns columnAdder;
int addResults = columnAdder.AddAllColumnsFromDatabaseRow(testerAdder);
//this should always add up to 12 because the rows are 12 in the mock object
//if it is not 12 then either the mock object or the underlying code of
//AddAllColumns has changed
if addResults == 12 return true else return false;
}
}
Object-oriented programming makes collaboration much easier. You might think that this isn’t relevant to you, because you’re not collaborating with anyone. But you are: Present-you is collaborating with past-you, and future-you is collaborating with present-you. If you program for long enough, then you will eventually have an experience where you pick up a program that you haven’t worked on for a few years, and want to improve it in some way… and you won’t be able to remember what the heck you were doing in it, or how. If your code is in independent, self-contained modules like objects, this will still happen, but it won’t be nearly as much of an obstacle.
You’ve asked a question that covers a very complex landscape and there is absolutely no consensus, there are pros and cons all over the place with constant evolution and learning and missteps.
Very short summary:
OO encapsulation helps even a lone coder frequently and/or for SOME types of problems (definitely not all)
OO class hierarchies help much less often and can cause significant problems, although there are some situations where it is the perfect solution
The idea of object oriented (the encapsulation part) can be applied at many different levels of systems and doesn’t even need to be based on an OO language. The principle is to have a system or object that can be interacted with using an API, a predefined set of actions and responses while that other systems internals are hidden so you don’t have to worry about them and you can’t mess them up.
The idea of class hierarchies, heck even the idea of having a class (or type) was a diversion from the original intent of OO and ended up being applied way too broadly, although it does map very well to specific types of problems. This one requires you to have a lot of experience and truly understand your domain to see if it applies based on current state and future changes.
You absolutely do want to understand OO because it will help you sometimes and you will encounter those principles with any other software or utilities you might use.
I’ve programmed professionally in COBOL, FORTRAN and PL1 for thirty years.
I’m a procedual programmer. I also write SQL code to pull data from an Oracle database and use report generators wirh ODBC like Crystal Reports. I use Microsoft Access to create small databases.
Briefly looked at object and it’s not worth spending a year of my life to completely retrain myself.
Yeah, it took me the better part of a year to get the hang of it, but at this point, it has become addictive. I see the object as an operational context that is very comfortable to work in where old fashioned procedural programming gives me a touch of sort of agoraphobia.
And often, if you want to do a certain thing, you will discover that there is already a class that does most of what you want and you can subclass it to add just what you need to get the desired result. It can be immensely easier than tying a bunch of library functions together because the code and data are already prepackaged for you to work with.
When I work on enterprise business systems, some of the key attributes in a system are:
1 - Ability to access data in adhoc manner by preferably non-technical people (so they can do real work without IT as a bottleneck).
2 - Ability to adapt the system for new requirements rapidly using least bottlenecked resources, which are prioritized in the following order:
2.1 - End user with self service capabilities
2.2 - Super user with more advanced self service capabilities
2.3 - Systems person that is not a programmer
2.4 - and finally programmer
Object oriented doesn’t really help solve those critical problems and in some cases creates resistance to solving them quickly (if you tried to update the model for every change), but can be used at a lower enabling layer where appropriate. Points 1 and 2 are better addressed by building tools that allow for what ultimately ends up being an abstract model sitting on top of configurable tools and base capabilities that can be combined.
For example, the business is branching into new territory (again) that has some additional requirements related to order processing just for this subset of orders (note the word “set” in that description, relational handles this stuff nicely). Someone configures some new fields on the order header, someone else configures some routing and processing changes so it goes through the correct steps, someone configures some reports for group X and some alerts for group Y.
This approach (configurable tools) allows for much faster evolution of systems, quicker reaction to changing business, reduced need for programmers to tweak the system for each change and then program in the newly required capabilities.
What I would suggest for you, newbie programmer learning on his own, is to not tackle OOP right now. It does make most programming easier and more logical, but it’s not necessary. If you were in a class it would be different, but on your own, there’s no need to add the complexity of learning OOP if you don’t want.
Back in the dark ages, we wrote plenty of large, complex programs without OOP. But that experience writing complex programs procedurally is what gave us the insight as to why and when OOP is better. If you haven’t experienced the pain of trying to keep a bunch of variables, structures, and other data elements in sync, you may not see why OOP is better.
Eventually, you’ll likely think of operating on a set of data rather than applying functions to a set of data. So right now you’re thinking like this:
render(myDocument)
But eventually it will make more sense for you to write your programs like this:
myDocument.render()
Part of the reason is that a function like “render()” doesn’t tell you what it renders. And someone could pass in some other kind of data “render(myBacon)”. So then you start writing longer function names to make it clearer, like “render_document(myDocument)” and “render_meat(myBacon)”. Eventually you’ll get frustrated trying to keep everything straight about what data goes with which functions and you’ll want to write:
myDocument.render()
myBacon.render()
That point will come naturally. If you’re more comfortable writing procedurally, then keep doing that. By the time you start getting entangled in the difficulties of that method, it will be just a small step to learn OOP.
People get hung up on all sorts of intricacies in OO programming that are not really relevant. Now, maybe it is just me, but I find the core ideas trivial to express, and any OO language easy to grok so long as you have a basic clue about what is actually going on. Where is gets stupid is when you get a religion of “correct” OO coding that takes precedence over the core idea. It isn’t as if these ideas are new.
Most of the encapsulation concepts go back to Simula-67. There is a clue in the name. Most of the basics of inheritance and the rules for binding and name resolution were codified in Smalltalk-80. Another clue there.
In Python all you need to do is remember that every class defines a dictionary that is used to resolve names, and there is a natural set of rules that tells you how the dictionaries are used at run-time to perform the resolution. Keep that in mind and there are no surprises. (For C++ just understand the virtual function table and you won’t have any surprises either, so long as you always keep in mind the hidden magic of the “this” pointer - Python is honest about that and makes the “self” pointer explicit avoiding any more surprises.)
After that it is just a matter of thinking in terms of a data directed abstraction of your tasks - encapsulating your data along with the mechanisms to make use of that data in appropriate lumps. The only trick is the experience in working out what those lumps are - but for simple tasks the divisions are usually pretty natural and obvious.
OOP will result naturally from well organized code. Encapsulation will eventually result, but you’ll end up with better code if you think it through at the start.
Even in a non-OOP language without the syntactic constructs you will find yourself grouping together data and functions that operate on that data. You’ll end up with a thing_create(), thing_dosomething() thing_dosomethingelse(), etc.
Native language OOP support just makes that sort of thing easier, and cleaner.
Eventually you’ll be writing another program and need that “thing” functionality. Being able to just pull it in as an “Object” at once is much easier than trying to separate the spaghetti strands from the other program.
For every case of “Hey, the problem started out simple and stayed that way, forever” there are thousands that didn’t. For the vast majority of those, OOP was the better choice from the get-go.
Which online course are you using? Do you like it?
I’ve done most of my Python programming on my Apple tablet using Pythonista (it apparently emulates Python in the Apple environment). It has several packages, but you can’t freely download any package out there, which has limited my ability to follow along with some books and online tutorials. I’m pretty good at basic stuff, but I keep hitting a wall, which I think I could get through with a well designed course. And a teacher- I may have to break down and find a local evening course!