Programming alternatives, Part 2

Voyager · September 30, 2018, 9:58pm

Johnny_L.A:

Easytrieve can only match files when both files are sorted by the key (a/r#). The first step (after creating a fixed-position text file) is to sort the Current and Previous files. Most of my programs only require sorting on the a/r#, but there are a few that require sorting on the a/r#, then another field, and possibly more than one field.

Once the Current and Previous files are sorted, another Easytrieve matches them:

IF MATCHED
[indent]OUT-NAME = PREV-NAME
OUT-NAME2 = PREV-NAME2
OUT-ADDR = PREV-ADDR
OUT-ADDR2 = PREV-ADDR2
(and so on)
PUT FILEC
ELSE
NOT-REC = IN-REC
PUT NONMATCH
END-IF
IF NOT MATCHED FILEA
NOT2-REC = PREV-REC
PUT NONMATCH2
END-IF[/indent]

There’s more going on depending on if the records are matched or not, but that’s the gist of it. The output file names I use are [member number]_CURRENT_OUTPUT.txt, [member number]_NONMATCH.txt, and [member number]_NONMATCH2.txt. I combine these three files and import them into Excel where I colour-code the records that are not matched in the previous file, and the records that are not matched in the current file. Then I ‘clean up’ the new records, sort by a/r# (we need to send the data sorted by a/r#), turn the .csv file into a text file, and then run that text file through the reformatting Easytrieve, and example of which I posted earlier.

I know how that sounds to people familiar with relational databases and newer programming languages, but there used to be five people cleaning data manually. Now there’s just me and my boss. One file might take a week to clean up manually (and the one I’m thinking of doesn’t need to be reformatted), and even with monkeying about with importing a .csv file into Access, using Access to write a fixed-position text file, putting the output together and fixing the new records, and then doing the text conversion again, it takes under an hour and it doesn’t wind up in the recipient’s ‘suspense’ file because the data’s consistent from month to month.

I’ve written code like that, before I used Perl and its associative arrays.
All you’d need to do is read either Previous or Current in, and store it in an associative array indexed by the key. (No sorting required, Perl does the hashing.) Then read the other file in a record at a time, look for a match (which takes one line) and do what you need to do if there is or isn’t a match.
I’ve done this with much larger data sets than you are likely to have, and it is fast and easy.
My team built a relational database for my system (I worked for a database company, so it was free) but that was for keeping historical records and letting lots of uses have access to them. That is way overkill for what you need.
There are lots of companies out there who make a lot of money selling simple functionality to non-programmers.
I’d write you the code but I’m a few thousand miles from my Perl book, and since I haven’t done much programming since I retired I’d probably get something wrong.

Voyager · September 30, 2018, 10:08pm

GreenWyvern:

IMO you should hire a programmer to write what you need, using a fast, compiled language (because you are processing up to about 100,000 records), and then learn to modify the code. It will be easier to modify existing code than to start writing something new in a new language from scratch.

It looks like you don’t need a database system at all. You simply need to process flat files, and many languages will do this.

I’ve done this kind of thing often using Lazarus (Object Pascal). It’s lightening fast, and has a complete range of database tools if necessary. It can do simple things in a simple and straightforward way, as well as handling any level of complexity. It’s a very mature language and IDE, constantly maintained and updated, free and open source, and cross-platform. There is any amount of help information and code samples available for anything you may want to do.

I taught Pascal in grad school 45 years ago. I hacked the Zurich Pascal compiler from Jensen and Wirth into my own Pascal-based language. I love Pascal. But in all the resumes I’ve seen I’ve never seen Delphi or Lazarus mentioned, or even Pascal these days (which I would have found quaint.) Almost all have Python and Perl, though Perl was going down.
Much as I like Pascal, I’d never recommend anyone learning it these days. Except for fun as f fourth language perhaps. (Not Forth either.)

Johnny_L.A · October 1, 2018, 12:24am

Any and all assistance in that regard would be greatly appreciated. But first I’ll need to learn about Perl, how to run it, how to store the files, and so on. (Actually, the first thing is to see if our IT people can reinstall the license key so that I have some breathing room to learn a new language.)

Stranger_On_A_Train · October 1, 2018, 12:42am

Running Perl scripts
Using Perl to read and write to text files

Stranger

Johnny_L.A · October 1, 2018, 1:12am

This is helpful, and I liked the video.
#!/usr/bin/perl
use strict;
use warnings;

print "hi NAME
";
The ‘print’ command is obvious. What are the other parts?

Lots of code to look through. The first three lines are the same as in the 'hello
example. I’m guessing these are standard records to run all Perl programs?

I see ‘use::tiny;’ I clicked on the link and there is a Wikipedia-type page on it. I’ve only skimmed it, and I’ll read it all later. Is ‘tiny’ a way to access a file without typing in the whole address? FWIW, I like to see the file name like this: ‘H:\EZ_Text_Files\833118_Current.txt’ so that I’m sure where things are coming from or going to. The ‘autodie’ explanation at the bottom of the page says what it does, so I think I have that.

As for the rest, is there somewhere that defines what I’m seeing? For example, ‘#’ seems to be the comment indicator, much as ‘*’ is for comments in Easytrieve. ‘use’ seems to be a command to use a module that (I assume) exists in Perl. ‘foreach’ seems to be like DO-WHILE, or else it’s for telling the program to read every line. (It seems strange that you would have to tell it to read every line. In Easytrieve, you could put PUT FILEC FROM FILEA and it would write every record.) In any case, is there a place that defines these things, and others like ‘my’, the brackets, the dollar signs, the semicolons, and so on? Is there a Complete Idiot’s Guide To Perl?
ETA: Actually, I did find Perl For Dummies. Also, Learning Perl: Making Easy Things Easy and Hard Things Possible.

.

DPRK · October 1, 2018, 1:37am

use strict and use warnings enable various error and mistake checks for things like using variables without declaring them, so you basically always want to use them.

The first line contains an explicit path for the shell to find the Perl interpreter; it may not necessarily be the correct path for your Windows installation or be relevant; try running the script under Strawberry Perl or look at the examples to see how they recommend invoking the interpreter.

Yes (see above).

Stranger_On_A_Train · October 1, 2018, 1:37am

The first line that starts with the hash-bang (#!) tells the interpreter that it is looking at a Perl script, and optionally can set which Perl interpreter to use. /usr/bin/perl is just the default location on Unix-type systems, although it is often an alias for the interpreter stored elsewhere. In general, the hash mark indicates a line that should not be interpreted as a command, and so is used for comments.

use strict controls scoping of variables and functions, so that local variables aren’t accessible outside their function or method. For what you are doing, you don’t need to worry about that.

use warnings controls the scope and verbosity of warnings (issues with the code that don’t prevent it from executing but aren’t what the interpreter expects). It doesn’t affect how the code executes.

foreach is just a for control loop that accepts and itereates over a list without having to define an iterator.

I’d recommend just going through a few tutorials that cover the input-output and control loop functions, as well as reading up on regular expressions. It may seem different and overwhelming at first, but Perl is actually a pretty compact command set and you should be able to get up and running after a few sessions at a basic level, and then learn as you go. The best way to master a programming language is just to use and experiment with it; listening to someone else lecture about it doesn’t really instill the experience and familiarity you need to think in programming terms.

Stranger

allyn · October 1, 2018, 1:37am

There’s an interactive Perl teaching site you might find useful:

Johnny_L.A · October 1, 2018, 1:57am

That it does! But then, any foreign language does.

A couple of questions: I have to use fixed-position text files for Easytrieve. Does Perl require this, or can it read a .csv file? Suppose I’d rather not ‘clean up’ new records. Can Perl parse a record and do things like get rid of periods where they don’t belong (and keep them where they do), abbreviate directionals (except where the directional is part of the name – e.g., North Main Street would become NORTH MAIN ST, but NORTH FORK DR would still be NORTH FORK DR), change ste/unit/trailer/trlr/etc. to # and that sort of thing?

DPRK · October 1, 2018, 2:15am

Perl can trivially read .csv files, and do all of the stuff you described using simple regular expression substitution. Aren’t there any such examples on whichever site you are browsing?
A (minimal) Perl script can be as trivial as

while (<>)
{

some regex substitution

print;
}

Stranger_On_A_Train · October 1, 2018, 2:23am

You can use chomp and split to lean up and seperate fields. You can do the other transformations using conditional regexs.

Stranger

AnalogSignal · October 1, 2018, 3:05am

Instead of learning everything yourself, I think it would be quicker to get someone on fiverr.com to develop a Perl script to your specifications for anywhere from $5-$50. The interwebs has made outsourcing this kind of stuff easy.

Voyager · October 1, 2018, 6:27pm

look at substitute string.

$name =~ s/north/NORTH/ for instance.

Be careful to have enough context so you don’t change “ste” in the middle of a word.
Strings can be treated as arrays, so fields with fixed positions can be handled easily. Though split is nicer.

And my main input was csv files - so no worries about them.

filmore · October 1, 2018, 6:52pm

What you had with Easytrieve is a specially designed, high-level machine that can take a special input (those txt files) and create a specific output. It’s like how a DVD player can play DVDs. An Easytrieve program can play Easytrieve scripts. Easytrieve is great at that, but it can’t do much of anything else.

Perl, and most programming languages, are like a set of basic tools you use to build machines. Just like a soldering iron, solder, breadboards, etc. can be used to build many things, so can Perl.

The things you are asking above cannot auto-magically be done by Perl, but you can write your Perl program to do those things. Perl does not have a function to do something like convert_geocode(“North Main Street”). Rather you have to write the convert_geocode function yourself. You have to take an arbitrary string like “North Main Street” and understand what part is the directional, what part is the street name, what part is type of street, etc. Perl provides tools that allow you to easily break up the string into individual parts like “North”, “Main”, and “Street”, but it’s up to you to figure out what to do with those parts. It’s up to you to figure out how to interpret locations like:

North Main Street
Oak Park Circle
Interstate Highway 10
IH 10
etc

Perl can break the strings up into their space-separated components. Perl can compare strings. Perl can change the case of strings. But you have to come up with the logic that converts the raw text into the proper form.

What you had with Easytrieve was a machine that could turn a log into a chair. What you have with Perl is a set of handtools–saw, screwdriver, nails, etc–which you use to convert the log into a chair.

(Note: the preceding analogies are just for entertainment purposes and should not be taken as correct.)

DPRK · October 1, 2018, 7:02pm

filmore:

The things you are asking above cannot auto-magically be done by Perl, but you can write your Perl program to do those things. Perl does not have a function to do something like convert_geocode(“North Main Street”). Rather you have to write the convert_geocode function yourself. You have to take an arbitrary string like “North Main Street” and understand what part is the directional, what part is the street name, what part is type of street, etc. Perl provides tools that allow you to easily break up the string into individual parts like “North”, “Main”, and “Street”, but it’s up to you to figure out what to do with those parts. It’s up to you to figure out how to interpret locations like:

North Main Street

Oak Park Circle

Interstate Highway 10

IH 10

etc

Well, you could do all that. Except Perl does have such a function; just because it’s not part of the core language does not mean someone has not provided one already

filmore · October 1, 2018, 7:10pm

Yes, that is an excellent point. Many languages have extensive functional libraries which provide a wide range of advanced capabilities. Sometimes the library is part of the core language, other times it may be available from an open source community of developers who create the functions and make their work generally available.

Since geocoding functionality is commonly used by many types of applications, it’s not surprising that there is a pre-written function available. You might want to look at the linked function to see what that code looks like. Since you know how it’s supposed to work (standardize addresses), you can look at the code to see how it accomplishes that task. If that function did not exist, you would have to write code similar to that yourself.

Voyager · October 1, 2018, 8:17pm

I’m not sure what the requirements are here, but I recommend searching for a function that might be closer to what is wanted than this one.
I would not recommend reading the code. Johnny L.A. might want to stick to a subset of Perl that he needs. Reading code might require learning lots of unnecessary stuff.
Also, Perl coding styles vary wildly. Mine shows that I used to be a Pascal programmer. Younger people in my group wrote more aggressive (and less readable) code. More efficient also, no doubt, but that isn’t important here.

At least there is a limited data set here, so even if the mapping isn’t 100% for all cases it could be 100% for his cases - so a simple function which does the mapping he needs would be easier than adapting an almost right chunk of code.

Johnny_L.A · October 1, 2018, 10:53pm

Update: We’re having out IT contractor come in tomorrow to attempt to install Easytrieve from my flash drive. I hope everything he needs is on there. fingers crossed I saved my old computer (the one before the one that crashed) specifically because it had Easytrieve on it and it worked. I brought it back to the office specifically so they could transfer the software from it if necessary. Apparently they took the hard drive back to their office and wiped it. :smack:

My boss is waiting for a call from Computer Associates to see how much a maintenance contract will cost, and what it will take to get the latest version of Easytrieve.

If we can get that back, then I can do my job. I still want to learn Perl though. Any recommendations on the two books I mentioned? I have both of them in my shopping cart.

Sam_Stone · October 1, 2018, 10:56pm

An experienced programmer could code that up for you fairly quickly in any number of languages. However, if you are not an experienced programmer, if I were you I’d use something like Microsoft Access - the almost universally hated-by-programmers DB/application development engine.

Or, you could use MySQL, or SQL Server, or some other database that has the ability to hook into code and/or do front end development.

Access has full CSV compatibility, and you can set up a template for how it should read the CSV file. Then you can load the data into records, and include the file date or version as its own field. Now it’s a trivial matter to do SQL queries to find matching records and write the one you want to a new CSV. You can even build a front end reporting system for it if you need to.

I have built this kind of code in probably half a dozen languages and several DBMS systems over the decades. Parsing CSV files and writing out new ones is an extremely common thing to have to do. And anyway, I already noticed that you are using Access to convert Excel files to the CSV for EasyTrieve. Therefore, just use Access. It sounds like you already have it and have a least a modicum of understanding of how to use it. The other reason I recommend Access is that you can build your front-end GUI in it, and there are a zillion resources on the internet for learning how to do just about anything in it.

Programmers are going to be biased towards telling you to just learn to write programs. If you aren’t a programmer and don’t have a working familiarity with all the concepts, this is going to be a really hard path for you to take. Maybe worthwhile, and it might be good to learn to program anyway, but it might be overkill for this job. So write a program to do it if you think the act of learning to program will have value for you going forward, but if not, just build it in a higher level tool made for people like you. That’s exactly what Access is for.

Sam_Stone · October 1, 2018, 11:04pm

Don’t forget that the Easytrieve/Python way still requires Access in the mix to convert Excel files to CSV. (Yes, I know it’s possible to read Excel in other languages). So coding up a solution to replace Easytrieve still leaves a process that’s a bit of a hack.

If you are already reading the files in Access, the simplest thing to do is just do the processing in Access as well. It’s trivial to create a table and write those excel records to it rather than to CSV, and once all his data is in tables, you can do all kinds of processing on it with simple SQL commands. And learning at least simple SQL is a lot easier than learning how to program properly in Python, including parsing CSV, sorting, writing files, etc. For an absolute newbie, that’s a pretty tall hurdle to jump, and would still leave him with a two-stage process.

Also, I think you might find that after you have all that data in an easily searchable and sortable database format, other ways of looking at it/processing it might start to look interesting. And the data is then in a standard format that can be exported to other databases, written to CSV or XML for transport, etc.

Topic		Replies	Views
Programming alternative Factual Questions	41	12017	May 2, 2010
I need a program to retrieve data from government sites In My Humble Opinion	13	171	October 31, 2024
Access 2010 Factual Questions	12	2199	July 14, 2011
Apparently the license for a software purchase is an optional add-on The BBQ Pit	16	2405	May 15, 2007
Recommend data-cleaning/reformatting programs/languages In My Humble Opinion	15	1603	September 18, 2017

Programming alternatives, Part 2

some regex substitution

Related topics