I still don't get Git (source control)

Pleonast · August 10, 2021, 11:34pm

Dr.Strangelove, your use case is interesting, but I think it’s also an edge case. How well does Perforce scale to, say, someone dabbling at home?

I’m also not sure what the point of the distinction between Git and a centralized system with a Git front-end. There’s no purity tests in development; we use bits from anything that works.

Dr.Strangelove · August 10, 2021, 11:47pm

Works just fine. I run my own Perforce server at home, and have it available publicly so I can access it anywhere, or give other users access. I had a little 2-person project going at one point, and we both used my home server. It’s free for low user counts.

That said, for small-scale development, Git is just fine. Like I mentioned earlier, it’s an unequivocal win for anyone used to the file.cpp.2.working style of “source control.”

Sure, I’m not at all saying that Git-like systems can’t be used at large scale. Just two points on the matter:

As soon as you add these systems, some of the initial advantages to Git–i.e., the reason for using it over some other version control system in the first place–disappear. For instance, baseline Git works just fine with no internet connection at all, say on an airplane with no WiFi. I think this was one of Linus’ motivations in the first place. Well, you can’t do that as soon as you need GitHub or some other server for binaries or your VFS source. Nothing wrong with that, just take note of what you lost and ask if there are really any advantages remaining.
Ad-hoc collections of third-party tools can be hard to manage. For a large organization, it’s a big help if there’s just one phone number you can call if something goes wrong. If instead, each group in a company uses some random collection of tools, it makes IT’s job that much harder. Hell, we’re having a hard enough time just getting everyone to use the same instant messaging tool.

Dr.Strangelove · August 11, 2021, 12:04am

And just to emphasize, if some generic newbie asked me which source control system to start with, I’d say Git. It’s extremely popular and reasonably easy to get started with. In that sense it’s a lot like Python, which I also have lots of critique for, but ultimately the popularity and ease of getting support is a huge win.

Some some particular cases, old-style source control is superior (IMO). And I’m really not a fan of “you’re doing it wrong” style argumentation, particularly when the claims are demonstrably untrue (“it doesn’t scale”). I use Perforce at home partly because I’m used to it, partly because it does some things that I like, but ultimately these things aren’t a big deal at a small scale. It’s different at large scale.

Dr.Strangelove · August 11, 2021, 5:57am

I did a quick experiment earlier: wipe out one of my clients completely and sync (clone) it from scratch. Result: 41.7 GB, 183,974 files, 6 minutes 4 seconds. That comes to 916 megabits/s; close to the wire limit (though I think it probably uses compression). This wasn’t at the busiest time of day but it wasn’t the middle of the night, either.

Yeah, I’m sure it requires some beefy hardware. But my company isn’t dumb, and realizes that hardware is cheap compared to wasted dev time. So we get beefy server hardware and desktop machines. Mine is a bit long in the tooth these days with only 24 cores.

HMS_Irruncible · August 11, 2021, 10:40am

Huh. Strange that I’d have written this, then:

Anyhow, on the binary front:

The fact well-recognized is that storing binaries in SCM is a sub-optimal legacy practice, and Git LFS was added to accommodate this. But most orgs don’t rely on it, because the industry standard has oriented around using dependency resolvers for this purpose instead of checking binaries into SCM. Just because you can coerce a system into doing a thing doesn’t mean it’s the best use of resources.

And this is a philosophical point here, but at its heart an SCM system is, as the name suggests, for managing source code, as in work objects where changes are recorded as lines which may be selectively merged solely as lines into a different branch. This isn’t just argument by definition, it has consequences, because:

I can see why this is terrifying if you’re using your SCM to store binaries. They can’t be stored as diffs… it’s all or nothing, so every version is going to have an entire copy of the binary. If you avoid doing that, then the version history is highly compact and easily transferred.

This is an odd argument. Any SCM system needs to transfer data at some point, so it’s silly to argue as if Git ever advertised that it doesn’t. What it does advertise is that you can be as autonomous as possible while you’re offline. You can add to the commit history, you can create branches and tags, without the mediation of a central server or admin, and then share those later.

It sounds like you’re in a large-organization work that has decided to trade off developer autonomy for having one large system that does everything. That’s one way to do it, there are trade-offs with that. In my world, having IT as an intermediary for dev tools is a no-no. I would rather have an unruly zoo of different tools than the risk of IT being anywhere near my workflow.

Look, there are different contexts and I’m glad you’ve found a tool that makes your legacy practices work. In a large org where very little developer autonomy is desired or expected, it’s justifiable. But if you look out at the larger industry, the developer experience has been trending away from one that is managed, mediated, and dependent, to one that’s more autonomous and independent. In a context like that, big central SCM systems just aren’t appropriate, and that’s why we have Git and friends to grant that autonomy and power.

Dr.Strangelove · August 11, 2021, 11:09am

There’s nothing to “coerce”. It genuinely just works without any effort on anyone’s part, aside from IT making sure the server is stuffed to the gills with RAM and storage. Perforce is designed to work well with binaries.

Untrue. To be fair, I actually don’t know whether Perforce does any kind of delta compression, but it is most certainly possible. I wrote a tool to bundle binaries for long-term storage (outside of Perforce) and it achieved 90x compression. That’s 30x on top of the standard compression we were getting. They were runs of quite-similar-but-not-identical binaries.

I am so thankful you did this, because I was just waiting for it. I knew you would get there. See, we can’t check binaries into an SCM because the S stands for source! Brilliant!

Sadly, the argument fails because Perforce simply calls themselves version control, not an SCM. There’s no S to be found.

Maybe your IT department sucks.

Ours gives us lots of autonomy. For one thing, no one is obligated to use Perforce. They also provide plenty of tools for Git/P4 integration, which some groups use for various reasons. For those of us that do use Perforce, IT has always ensured that it’s super fast and reliable. I think some groups are Git-only and they get their own servers.

If I want a giant blob of storage, I file a request and pretty much all they ask is how much do I need and what the share name should be. If I need to run a special dedicated server, I ask them for a VM and they ask how many cores and gigs of RAM it should have. If I don’t want to manage my own VM, I can ask for a MySQL or Apache instance or whatever and they provide it. It rarely takes more than a day or two.

Frankly, our IT used to be much worse and we really did run our own janky self-managed servers. But they got better and it’s all much easier now.

HMS_Irruncible · August 11, 2021, 2:00pm

That’s my philosophical opinion, and I’m glad to have… uh… excited you in some fashion?

If you want to drop down to the name-based argument and say it’s a “version control system”, fine, Git is not a “version control system”. It’s a source code versioning system, it’s oriented primarily around managing code as diffs. This makes branching and merging trivial and decentralized. If that’s a distinction you want to draw, then it’s silly to imply it’s somehow lacking related to Perforce. It’s just a different philosophy of versioning. It should be mentioned that it’s overwhelmingly dominating the market, so a little humility is in order while you’re dismissing it out of hand.

If you want to say it’s a different philosophy from your favorite tool, that’s fine. But if you want to claim it’s somehow inferior or lacking, then I’m going to talk about how it’s incredibly naive and wasteful to commit binaries into your versioning system, and that this is a legacy practice that’s mostly endemic to antiquated legacy corporate environments that are declining in the industry overall.

HMS_Irruncible · August 11, 2021, 2:35pm

Is that true, though? Offhand, I can’t think of any scenario in Git where a case typo would cause data loss.

Don’t get me wrong, if you’re clueless, careless, flailing, you can get yourself into a very confused state. But it would take some persistent and determined fuckaroundery to get yourself into more trouble than I could get you out of.

Mijin · August 11, 2021, 8:17pm

I mean things like:

git branch -d <name> : Delete a branch, but double-check there are no unmerged commits
git branch -D <name> : Delete a branch regardless whether it has unmerged commits, which can lose data

Of course in both cases we’re explicitly deleting something, which should always make any developer pause and double pause. But I would still be nervous about screwing this up one day, especially if the branch name begins with “D”.

Folacin · August 11, 2021, 8:19pm

Yeah, that’s the sort of thing where instead of D/d, you should have --force/-f to say you know what you are doing.

Mijin · August 11, 2021, 8:24pm

Sure, I would personally use --force, but that doesn’t protect me from one day typing “-D DeltaBranch”.

That’s why I personally will work in a GUI or if no GUI is available then at least create aliases for git commands. As soon as I heard I could shoot myself in the foot with one press of the shift key, I’m out.

Folacin · August 11, 2021, 8:31pm

I am agreeing with you that lower-case means “only if conditions are met” and upper case means “force” is a bad idea. I think it traces back to the general *nix fetish over being able to save key strokes. Just because you can write a C/Python/Perl/??? function using 6 characters doesn’t mean you should.

HMS_Irruncible · August 11, 2021, 8:33pm

Yeah, those semantics are not ideal. But there’s a finite number of letters in the alphabet, and this is a totally recoverable situation.

Remember, branches in git aren’t special, they’re just a pointer to a commit. If you haven’t yet run git prune (which people seldom do), then the head of that branch still exists in your repo. Use git reflog to get the commit ID, then use git checkout with that ID to recreate the branch reference.

It sounds like a hassle, but once you’ve done it once or twice, it’s very intuitive. Especially if you take a little time to get a good grasp of the commit model. Here’s one good article, there are many:

Dr.Strangelove · August 11, 2021, 8:51pm

I never dismissed Git. You’ll find that I repeatedly said Git is a fine choice for small groups and new users with no special needs, and with enough extras can be made to work just about anywhere. In Microsoft’s case, it required changes to their operating system, but hey, it seems to have all worked out and they have a Git-compatible system with (apparently) a 300 GB codebase.

It’s you that have been dismissive, trying to define away completely legitimate workflows as “legacy”, despite still being in widespread use and there being zero (or negative) reason to switch to a different workflow. Perforce is still very popular for game studios in particular for perhaps obvious reasons. Are game studios all “antiquated”?

Everything you’ve suggested would make our developer’s lives worse. Having a completely separate system for binaries doubles the points of failure and means many of our tools have to be written twice. Having many small repos inhibits discoverability and causes silos to form. Not checking in generated code means adding 10+ minutes to dev’s build time (or they are stored elsewhere, and have the binary file problem).

BigT · August 11, 2021, 9:13pm

And I don’t see why you don’t consider that dismissive. Saying something is good for small groups and noobs is saying it’s bad for everyone else.

I would say that, from the start of this thread, your post has been about how Git is inferior, and only really better than file.txt.1, file.txt.2, and inferior to “modern” versioning systems. You were presenting everything Git doesn’t have as flaws.

So it doesn’t surprise me that someone else came in and was say “well, Perforce is the one that is really flawed.” From the outside, from someone who only really knows Github, it seems both of you are being dismissive.

Dr.Strangelove · August 11, 2021, 9:29pm

HMS_Irruncible asked me specifically why I thought Git isn’t so great compared to “industrial strength” systems, so I elaborated. I’ve tried to keep some balance in my posts, noting that these are not all fatal flaws. But obviously I’m no Git evangelist and I’m trying to present the other side here.

The comparison to rename-style versioning perhaps came out dismissive but that wasn’t the intent. This really is a significant, non-trivial improvement over that style. For individual users they gain almost all the benefits of industrial-strength systems. The lack of needing to install a server gives it a low barrier to entry. It’s quite remarkable in its design that it manages to accomplish this.

HMS_Irruncible · August 12, 2021, 12:13am

Look dude, Facebook has 114 public Github repositories. NASA has 362. Google itself has over 2,000. That’s two thousand, and those are just the ones visible to me. If you’re trying to characterize Google et al as “small groups and new users”, I can only say thanks for the rare amusement of witnessing someone be so simultaneously confident and utterly wrong.

I wish you well in your career and I’m glad to hear the state of the art in enterprise-class systems has progressed since I’ve been away from them. May the implosion of your monster codebase never come to pass until after you’re retired and positioned to rake in some consulting bucks cleaning up the mess. But now, being as this is a Git thread, I’m going to abandon the Perforce threadshit/derail and invite anyone to hit me up with Git-related questions.

Dr.Strangelove · August 12, 2021, 12:32am

Nice way to mischaracterize what I said. But in any case, the fact that there are zillions of tiny repos for these large companies doesn’t exactly disprove my point. How many of those have 2,000 active, contributing developers?

At any rate, I do have a genuine Git-related question. I use Google Drive for backups and cross-computer file syncing. Is there a recommended approach for integrating the two? I once tried putting the repo directly on the synced drive, but it corrupted itself almost immediately, probably because slow syncing meant that the two machines sometimes cross-updated the files (not a great shock there). I can keep a local development repo and a synced repo that I push updates to, but it feels like that would run into the same problem. I want to seamlessly switch between my desktop and laptop(s) and have something sync as seamlessly as possible. Currently, I’m depending on the Drive revision history, which isn’t great.

leahcim · August 12, 2021, 12:57am

Not for nothing, but google famously does most of its internal development on a single monolithic (non-git) repo.

HMS_Irruncible · August 12, 2021, 1:09am

This should work in theory, for a single user, if Google Drive is updating promptly and frequently. I don’t know what promises GD makes on that front, so I can’t speak to it, but I’d expect that to work fine.

Having said that, I don’t know anybody who does that. Github’s whole purpose for existence is for mediating change between Git repos, so it’s the obvious choice (or Gitlab or any other free server, doesn’t matter). Yes, I literally use Github to share repos between 2 computers sitting on my desk. It’s going to seem weird coming from your world, but just rip off the bandaid and it will come quickly.

Topic		Replies	Views
Source Code Control for Web Development Factual Questions	23	2176	December 22, 2010
Programmers - teach me about version control In My Humble Opinion	53	3922	March 16, 2016
Can someone explain Git to an SVN user? Factual Questions	9	1912	June 19, 2011
Source-control for the hobbyist Factual Questions	7	734	May 9, 2006
Version control for documentation In My Humble Opinion	19	1784	October 6, 2015

I still don't get Git (source control)

Related topics