Update 2: Talking with the tech guys. I’ve started Easytrieve (it wouldn’t let me last week) and I did not get the license error window when the program I ran completed.
There are several modules at CPAN (for Perl) and in Python for reading and writing XLS files directly.
Your points about keeping everything in one application (MS Access) are on point, except my experience with doing text processing in VBA is that it is extremely clumsy, whereas Perl is designed to do exactly this. Plus, I suspect that if the o.p. learns Perl (or Python, or another general purpose scripting language) he’ll find a lot more ways to automate his workflow, and thus, spend more time doing more interesting things. And after all, isn’t that the dream?
Stranger
The nut of this problem will be learning regular expressions. Whether you go with Perl or Python or something else, regex will be the technical and mental skill that you’ll need to master.
Having said that, for the love of Christ, don’t bother with Perl. I say this as someone whose first real language was Perl. Perl was a breath of fresh air in 1996 when the basis of comparison for “easy” was C, but it’s still riddled with warts and arcana that make it anything but beginner-friendly. In 2018, for all intents and purposes, it’s a dead language.
Instead I recommend going with Python. It’s a solid all-purpose language, and a reusable skill, and you won’t get bogged down in 20 different ways to do the same thing. But you will have to familiarize with regex (regular expressions) whichever approach you take.
Python provides the Perl regular-expressions library, so IMO there is no momentous exclusive choice to make here.
ETA I’m confident all the people in this thread who have mentioned writing Perl scripts have done work using Python as well.
Perl served me well back in the day, I wrote thousands of lines in it and was happy I had it. All I’m saying is I wouldn’t steer a newbie toward it nowadays when Python is a lot more approachable and ubiquitous. I have no skin in that game, actually I spend more time in Ruby and Elixir nowadays (also great languages). But when a newbie asks for a swiss army knife for text crunchin, I steer them toward Python.
Python would be my preference as a general purpose scripting language, but the o.p. will be able to get up and running faster with Perl, just because there is less overhead and Perl has tegex biuilt into the language (although using the re module in Python is not especially difficult). If the o.p. wants to learn prigramming, Python is the obvious choice. If he just wants to fix his immediate problem, and Easytrieve isn’t an option, the fastest route is probably some Perl scripting.
FWIW, I use Python on almost a daily basis and haven’t touched Perl in several years since the last time I needed to do a lot of text parsing. But for what the o.p. is doing, Perl is tailor-made.
Stranger
I don’t know whether that’s Johnny L.A.'s dream or not. But I also know that it’s often best to stick with a known entity than to cobble up some ad-hoc code outside of a formal software engineering environment. Without version control and decent code management, little hacks that people have cooked up to keep business processes going can lead to a lot of trouble down the road. There are a lot of businesses out there who have tried digitizing/automating and get thwarted by hacks that long-gone employees cooked up, didn’t document, and which exist in multiple versions as different files in file systems, each one of which may have its own bugs or quirks.
“Yeah, we have a process to gather data and move it into SAS. It’s done with a program that old Bob cooked up. No, we don’t know exactly what it does, and sometimes it’s flaky, and we don’t have the source code. But it’s absolutely essential to our process. Bob is gone, so we searched our archives and found a directory with some old source code, but we don’t know which version it’s for or if it’s complete. We also found three executables, but we don’t know what they do or whether they are the right versions.”
That kind of thing is extremely common, especially in companies that have processes that have been around a few decades and who never had their own in-house programming shop.
If you don’t have a facility for testing, if you aren’t versioning code and archiving it properly, and if you don’t have other software engineering best practices in place, these kinds of one-off scripts and hacks can become unwieldy, or in the worst case put the entire process out of control requiring a completely new development process to figure out the requirements and re-code a new module properly.
At least if you do it in something like Access, you can hire an Access guy to figure it out if the original programmer is gone. And the risk of obscure bugs is a lot lower, because Access was designed for non-programmers and has a lot of consistency checking and stuff built in.
BTW, you shouldn’t have to do any fancy VBA coding to import those files. Access has some pretty powerful templating features that allow you to define some fairly complex input files and parse them. Once they are in the database, you can do manipulations in SQL. And if you do have to resort to VBA, there is a ton of code out there for access programmers and a lot of examples.
If Johnny wanted to use this situation as a reason to learn to program, that’s fine. Then Python away. But he should recognize the importance of properly handling the code after it’s written. Documentation, validation, versioning, archiving… All of that is critically important to any long-term business process.
I already program. It’s just an ancient language. (Come to think of it, it was ancient when I started with it 20 years ago.)
I dislike Access. If I knew how to use it, that might be different. But it’s really pissing me off lately. The ‘consistency checking and stuff’ assumes I don’t know what I want. With Access 2003 I didn’t care if my fields going in were longer than their output length. Access 2003 truncated them. With 2010 I tell access the field width, and when it sees a bit of data that’s longer than that it says, ‘Humans are so stupid! I’ll fix it for him.’ So my data is shifted. With 2010 I have to ensure that the data I’m importing is no longer than the width of the output field. The ‘new and improved’ version of Access basically doubles my work. :mad: I only use it because Easytrieve requires text files, and Access is the only way i have to turn an Excel file into one.
But yeah, this Easytrieve disaster is a good reason to learn something newer.
Well, except that they already seem to have a cobbled together quasi-enterprise system using an expensive and fairly obscure proprietary code. Pretty much anything that works will be an improvement. And for something that just does this data scrub and translation, ersion control and configuration management of the code, while always a good idea, can be done manually like we used to do before CVS, Subversion, or GitHub. Or the o.p. could learn to use a revision control and system; I like Mercurial personally, and TortoiseHg is a good GUI front end, but for a one-off script that does a simple task that can be easily verified it’s more overhead than he needs.
For a larger enterprise-wide data management scheme, then you are absolutely on point, but then that’s a scenario in which the company really needs to purchase a turnkey system with support or hire an expert to build them a system from open source components and documented glue code.
I’m not a relational database expert but I’ve worked as an RDB manager at a few jobs in college and as an ad hoc responsibility after, and I have never had a good experience with MS Access. I have gripes with all of the MS Office suite of tools insofar as they seem to be poorly integrated technologies developed by other companies for purely desktop applications that Microsoft has repurposed into enterprise productivity solutions and expanded to do things they are not well built to do, such as using Word as a structured document editor or Excel as a large data processing tool. Watching engineers try to do complex statistical analysis or Fourier transforms in Excel is so fucking painful, especially when they have purpose-designed tools like Matlab and Mathematica available to them. And Access seems like one of those tools that is marginally workable for simple databases that reside on one person’s desktop but come apart when you start sharing with multiple users and integrating into a larger system.
As stated above, both Perl and Python have modules to read from Excel format files directly. Even if you end up using Easytrieve, you can use one of those modules to convert format and do some basic data cleaning and checking. And you’ll learn a few things in the process.
Stranger
Quoted for truth, my dude.
Though, on the version control front, I would point out that Git is actually ideal version control for small, non-proprietary projects, because it can be used without an upstream server such as GitHub. I even use it to version my Linux home directory and environment files! It’s a great improvement over source_code.c, source_code_1.c, source_code_1_this_really_works.c, source_code_1_fix_pointer_issue.c, ad nauseaum.
In fact, both Git and Mercurial are ‘distributed’ revision control systems. I have encountered both, and for a non-power user they both provide the same type of basic functionality (including, naturally, the ability to work with your own local copy of a repository).
You might consider the old GNU tool “awk” for this. It sounds like a closer match to what Easytrieve was doing than Perl or Python are, and you could code each step as a separate awk script and run them in any sequence you like. It’s available on every OS and should continue to be available for the foreseeable future.
Oops, I should have read the thread before replying. Sorry. It looks like you’ve settled on Perl, not my favorite but competent enough.
If I’m reading the replies correctly, Python is preferred by most, but Perl is ‘tailor-made’ for what I want to do and is easier to learn. I’d like to hear about Awk though.
I did order Perl For Dummies today.
AWK (not Awk!) is a scripting language from 1977 so it actually predates both GNU and Perl. It is included in Linux and Unix operating systems as a core utility so it, like the “stream editor” sed, still get use especially for simple transformation of or data extraction from formatted text files.
Interestingly enough, Larry Wall wrote “[Perl] combines (in the author’s opinion, anyway) some of the best features of C, sed, awk, and sh, so people familiar with those languages should have little difficulty with it… If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don’t want to write the silly thing in C, then perl may be for you. There are also translators to turn your sed and awk scripts into perl scripts.”
ETA as for what Perl was made for, immediately before this he explains, “Perl is a interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that information. It’s also a good language for many system management tasks. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).”
Perl is a superset of Awk and sed, and implements standardized versions of grep and regex. Since nobody but Unix sysadmins and powerusers really used any of these tools and they’re not available on Windows by default, there is little reason to learn them versus just using Perl, which has far more functionality and can be made into an actual structured program rather than a series of scripts pipelined together. As mentioned before, Perl also has extensions like Perl Data Language and various tools for data visualization, database interface, et cetera, so it can be used for a wide array of different applications, although I don’t think it is really the right tool for larger applications.
Python is a fully featured language, built from top down to support multiple programming paradigms including object oriented, functional, procedural, and imperative programming, which a wide array of supported datatypes and data structures through standard modules. I think the original intent was to use it as a prototyping language so you could build a test application in Python and then convert it to something like C++, but it has evolved into a powerful language of its own with C-based extensions that allow it to run multithreaded, using optimized functions and libraries, and with close order of magnitude performance compared to native C or FORTRAN. As it has done so, it has implemented some novel features and the general concept of “pythonic” coding (e.g. the “right way” to make readable, high performance code using the inherent optimization and iterables (lists, arrays, and vectors over which you can assign a function to iterate without explicitly declaring a loop with a separate iterator).
For someone wanting to learn a ‘real’ programming language, I’d recommend starting with Python (or maybe something like Ruby, although I have no experience with it) and then move up to Scala, C, or Objective-C, then onto C++ or Java if you want the pain of learning how we programmed in the days when the priesthood of computer scientists were trying to make programming as difficult and esoteric as possible. Save FORTRAN or COBOL for a rainy day when you want to experience the novelties of working with languages with a variable name length limit or that have a format based on the fact that it originally had to fit on a physical punchcard, and then learn some version of Lisp or Scheme just for the joy of fucking around with something that no one will ever ask you to use it outside of academia. (That’s not strictly true; I used to program automation routines in AutoCAD using their proprietary version of Lisp, but most tools have gone to using Python or some similar language as their default external command interface.)
But again, for what the o.p. is doing, Perl is functionally adequate, pretty easy to learn, and doesn’t carry any portentousness that the programmer might break something or have to learn some abstract programming paradigm that will serve him in no way whatsoever. Perl may not be the future, though I am dubious at claims of its imminent demise; I’ve been waiting for Java to die off for twenty years, and the fucking thing will just not go away despite that it is a poorly implemented piece of shit and there are plenty of newer and better implementations of languages using the Java Virtual Machine. And Perl is a decent gateway language to more sophisticated programming languages which can aid you in understanding why languages implement specific features or formats. The one problem with becoming a Python programmer first is you don’t realize just how thoughtlessly many other languages are implemented and expanded in an ad hoc fashion (Java, VisualBasic), or else were designed with some theoretical paradigm in mind with little consideration for ease of use (C++, C#).
Stranger
That 'expensive and fairly obscure" environment was an environment specifically designed for exactly this kind of task, and was widely used and pretty well known in the mainframe world when this kind of task was more common.
The specific advantage of easytrieve was that it was even more easy to read and reconfigure than anything else. And this in a situation where writing scripts or programs to do reports was the major responsibility of the IT department, with many competing tools. The OP suggested that his task included frequently modifying the code. There is very little chance that /any/ other environment is going to improve on that aspect of the task.
Easytrieve was slow enough that people used to convert their scripts to COBOL. but I thought that the later version of Easytrieve was compiled? In any case computers are faster now, and it’s certain that a beginner can write very slow Perl scripts.
IBM seems to offer an Easytrieve-to-COBOL translator… All other things being equal (which they seldom are), COBOL is free software, Perl is free software, and I’ve always had a soft spot for open/free software; no funny license mishaps, and does Easytrieve have an archive of modules rivaling CPAN for Perl? No need to fuck up a perfectly working system, of course, but I see no reason to maintain brand loyalty in any new scripts either. (disclaimer: I’ve admitted having absolutely zero experience with Easytrieve, so if it’s well and truly the best thing since Grace Hopper by all means keep using the best tool for the job.)
I worked in perl from the late 90s until last year. Then I got a quick ramp-up on Python.
Maybe I just have the wrong libraries, but getting data in and out of a relational database (MySQL, in my case) takes far less code in perl, and I have spent hours building data structures in Python that match the SQL data structures and loading the records field by field.
As a general-purpose language, Python for the win. For manipulating strings, especially when reading/writing them from relational databases, perl/regex takes a tiny fraction of the code and the coding time.
(Yeah, I know. I just crossed over from GQ to IMHO with that response)
Wait, let’s back up for a moment here. Apologies if this is a really stupid comment, but some of the previous discussion was confusing about the scope of the problem. If we have simple string data in an Excel spreadsheet, that needs some very simple manipulation and reformatting, then we should first check that the whole thing can’t be done easily in Excel before we consider more involved solutions. You could at least produce the CSV files directly, no?
Even if ‘real’ programmers hate Excel even more than they hate Access, it remains by far the world’s most popular programming environment, and most companies of any size have considerable in-house expertise.