I’m sorry to ask such a trivial question, but it’s been a long time since I’ve done any file-input programming, and a longer time since my programming classes.
I have a program I’m writing in C, which is reading input from a file. I want to read in everything in the file (across multiple lines, if it matters) until I get to a particular string, and discard all of that. Then, only after I’ve read in that particular string, I’ll start actually reading in the real data that I want. I thought I could do something like
fscanf(infile,"STARTHERE");
but that only moves the stream so far as the start of the file matches the string.
Then I tried
fscanf(infile,"%sSTARTHERE",junk);
(where junk is, of course, a previously-defined string variable), but that just puts until the first whitespace into the junk string, and does nothing with the STARTHERE string.
I’m sure there’s a simple way to do this, but I’m blanking on what it is.
(note: The string I’m actually searching for isn’t literally “STARTHERE”; I’m just using that in the question for simplicity)
That’s correct (that you can’t do it with one call to fscanf). In fact, if STARTHERE could be embedded within a larger string (i.e. not delimited by whitespace) then it’s a bit more difficult that it would seem.
Is it correct that the text prior to STARTHERE could contain any combination of any-length strings plus white space (including new lines)?
If so, I think you will need to call fscanf(infile,"%s",junk) in a while loop and then do a strstr(junk, “STARTHERE”) call to see if the string you read in contains your starting string. If yes, exit the while loop. Unfortunately, some of the data you want is now embedded in your junk variable, so you’ll need to parse that part (strstr returns a pointer to the matching string) before reading in the rest of the data.
On preview: Marvin’s link has the code for what I’m trying to describe, but uses fgets() instead of fscanf(), which is possibly better. But as DPRK points out, if you can’t assume that your lines are <512 bytes or that your token can’t cross that boundary then it won’t work. The most reliable way is actually probably to either read the whole file or do it one character at a time until you find the well-formed part of your file.
Yeah, that’s what I’m trying to do (except the actual data will mostly be %f conversion, not %s), but I’m not sure how to do the “scan until I get to my match” part.
As of right this moment, the full text of my input file up to the start point is probably consistent, but I don’t want to count on it remaining so indefinitely (there’s a lot of header-type data, which includes a couple of version numbers which will eventually change). On the other hand, my STARTHERE text will probably always have whitespace preceding it and ending it, and will always have whitespace within it (so I suppose I could just treat it as two strings, since the part up to the first whitespace should be distinctive enough).
If the header data is variable, you probably should use a regex search.
Or, read your input into a buffer, then use strstr() to find the ‘STARTHERE’ string, then parse the rest of string with scanf().
This.
There no reason not to memory map the file unless it is a stream. You pay a silly amount of overhead for no good reason by not mapping it. It isn’t even as if this is a new thing. I was doing this in about 1985. And if you need a regex, the world is your oyster.
Alternatively, couldn’t you just read the file one character at a time checking for a match?
I’m rusty on C so treat this as pseudo-code, but something like: [Grrr, sorry about the lack of indentation]
Found = FALSE
C=1 /* character in the Target$ that we’re checking for a match
Target$=<what you’re looking for>
Do While Not Found
B$ = <read next character from file>
if B$ <>Target$[C] then
C=1
else
C=C+1
If C > Length (Target$) then
Found =TRUE
endif
endif
loop
The Variadic Functions like the scanf family in the standard library are unsafe and should never be trusted for input you don’t control. That said a while loop with sentinel values is the typical solution even using them.
That said, you use ‘*’ to suppress assignment of conversions, so:
scanf("%c %*c",&a);
Will discard the second char in that matched conversion, or you can match in a while/for loop.
Well, there is another reason: I never learned to do it before (back in the mid 90s, while it might have been possible, stupid memory limits often made it difficult), and the extra computer time consumed by the silly amount of overhead, even over this program’s entire expected usage life, is less than the extra programmer time it’d take me to learn it that way. And it’s clearly not something that comes up in very many of my other programs, given how out of practice I was on file IO to begin with.
Now, there are some areas of this code (after the IO part I have to get through, before I even have a chance to play around with the fun parts) where I am in fact worried about efficiency, but those are parts that involve (admittedly low precision) trig functions, and which might end up getting looped over a million times, unless I can think of some more clever way than brute force for those algorithms.