[QUOTE=Digital Stimulus]
So, I’ll do a little terminology refinement here, just to make sure we’re on the same page. Both “server” and “client” refer to single, independent processes (likely one per “host”, although the hosts in your case will be virtual machines, IIRC). What’s being asked for is a “job”; I’ll assume one job per client.
No matter what the system structure, you start up the server, which has a set of jobs that need doing. From my last post, I was thinking that the server sits there waiting for clients to connect (much like a web server), but does no client allocation of its own accord. Instead, you start up a bunch of clients, each of which initiates a connection to the server, at which point a job is allocated (preview: this is where the mutex comes in). The server does some minor bookkeeping (e.g., marks the “isSent” flag) and then sits and waits.
Each client does its thing; once its job is completed, the client then returns its result to the server. The server then does some more minor bookkeeping (e.g., marks the the “isComplete” flag) and likely stores the result someplace (possibly in a log, possibly in memory for later organization, such as putting the jobs back in order).
Note that there are various options for connection management here: sockets can be maintained over the life of the application (you improve efficiency by not having to construct and destroy them, although then you need to handle broken sockets), they can be opened/closed at the start/end of a single job, they can be opened/closed at each point of contact, etc. Your choice.
Alternatively, you could set it up to work the way I did mine, which is that the server is in control, allocating jobs to clients pro-actively. That means that the server has to do a lot more internal bookkeeping, including already knowing what clients are available, when they’re active, and what job they’re doing. It makes sense for me to do it this way, given my requirements; seems to me you can avoid some of the work by offloading control to the clients.
The mutex is simply so that any single job only gets allocated once. Since you’ll have many clients making requests, likely overlapping, you need some way to do that (unless you don’t mind having individual jobs potentially done by multiple clients). A similar thing might apply to receiving the results; for instance, you might have to lock a file for output so that results don’t get mish-mashed together.
As far as multi-threading goes, I don’t see a need for it in the clients; but I think you’d suffer a substantial performance hit if you didn’t multi-thread the server. Finally, from what you’ve described, I don’t really think there’s a need for simultaneous addition/removal to a queue; rather, I think you’ll have your set of jobs to allocate at startup (so it’ll always be removal, could be a queue, list, stack, whatever), and the results will be gathered as they’re available (always addition). I’m not seeing the need for a producer/consumer model here.
Was that understandable?
[/QUOTE]
Thanks for that DS, I’ll have to dig into this deeper a bit. I just actually spoke to my boss and here are the key points he told me. First he told me not to focus too hard on the object that gets sent. Well that’s actually done already. As far as my testing purposes go, that is. I’ve got an object now that, once properly initiated, will pull a file from the server, do something to it, and put it in the results folder.
So the test object is ostensibly done. Now when I asked if I need to use multiple threads, he said no. What he did say to look into was the “select” command for sockets.
Quick question about sockets since I’ve gotten your attention here. The following is some sample socket code I’ve been using and I’d like to ask a few specific questions if you wouldn’t mind answering. I’m going to interrupt here and ask your advice. Note that the indentation is kind of messed up. It didn’t run without some work on my part in cleaning up the indentation.
from socket import *
HOST = 'localhost'
PORT = 21567
BUFSIZ = 1024
ADDR = (HOST, PORT)
serversock = socket(AF_INET, SOCK_STREAM)
serversock.bind(ADDR)
Binding to ‘localhost’? That makes me feel like it could only communicate internally, where do I get the address of the other machine?
serversock.listen(2)
okay, this is the biggest mystery here for me. I read that this is something called a backlog. Apparently this can be a value between 1 and 5 (on most systems). Hypothetically in a single-threaded server, what would happen if say 4 people decided to connect at once? Would this simply have a queue of maximum five connections?
I’m just guessing here, but does this continue to function while i’m in the loop below? So if I fill up my queue here, and after one disconnects, can I add another one to the list? This means to me that it’s setting the state of the socket, and isn’t something that needs to be called again to enable more connections. Is that right? Does this simply define the way the socket behaves in the loop below?
while 1:
print 'waiting for connection…'
clientsock, addr = serversock.accept()
I was a bit surprised to see here that there wasn’t an infinite loop. For some reason the loop stops at the serversock.accept()
print '…connected from:', addr
while 1:
data = clientsock.recv(BUFSIZ)
could you explain why there needs to be a BUFSIZ variable there? I realize it was set to 1024. I’m just curious as to why it needs to be done.
if not data: break
clientsock.send(‘echoed’, data)
clientsock.close()
serversock.close()
If I am right about the socket.listen() command then couldn’t one open several sockets? If that were the case then I can understand why he’d want to use the select command.
Anyway, thanks again for your help. Python’s weird as I’m used to C++, but I do really like it. I feel bad about coming in here with programming-related questions, but the truth is that this is one of the best places! Anyway, I figure it’s better to have this conversation where someone might be able to find it later. Who knows? Maybe someone will find it useful.