HOME  - -  Lazarus/ Delphi tutorials

Processing a file of lines

Or lines of data from another source
A general overview

(Page's URL: 3linesfs-main.htm)

I hate to think how much time I have wasted over the years chasing down bugs in programs which do something which I do quite regularly.

The task that often proves to be good at hiding flaws in program code is the task of going through some lines of data and either re-organizing them, or extracting information from them.

In what I've done, these lines have typically come from a data-file, but the source is immaterial.

Usually, the amount of data won't change as the job progresses, but often the number of lines of data is not known in advance.

Let me reserve some terms? I would like, for instance, to use the term "line" very narrowly in the rest of this. To remind us both that I am doing that, I will add "RT" to the end of the word, to flag it as a Reserved Term.

So... what do I mean by a LineRT?

In this essay, a LineRT will be a string of characters from some source that is organized so that it is easy to collect such strings one at a time. They may or may not be of constant lengths. The characters may or may not be printable... but while all that follows could be used equally well on data that contained non-printable characters, I usually work with the simpler case, and everything in this essay will too.

Perhaps a bit tautologically, a SourceRT is anything that can supply LineRTs!

An ordinary text file is a good example of a collection of LineRTs on a SourceRT. (The physical SourceRT may be a "memory card" or hard drive or similar. "The file" could also be called "the SourceRT" of the lines.

The Big Pain In The Neck

With some SourceRTs, you can know how many LineRTs there are from the outset. They are quite easy to process without getting all tangled up in code that may not work in every instance. Quite easy. It's still possible to write the code badly.

But with other SourceRTs, you don't know how many lines there are. If you read the first, the second, the next, the next, the next... eventually the subroutine that you use will return a message that says "The last time you tried to read a LineRT, you read the last LineRT in the SourceRT."

As much as that is a pain in the neck, there is a way around it.

You CAN'T, however, just use the easily understood...

Open SourceRT
Repeat
  Fetch a LineRT
  Process what's in it
  Until all lines done
Close SourceRT

If you don't see why that wouldn't work, think about it.

...

...

... got the answer? It would be best if you got it for yourself. Why would that plan not work if you didn't know how many lines there are, or at least know as soon as you had read the last line?

...

...answer below...

...

...

It wouldn't work, because it calls for fetching a LineRT when there's no LineRT to fetch. AND to process the LineRT after you've fetched it.

You might, of course, build something based on...

Open SourceRT
Repeat
  See if you can Fetch a LineRT
  If you were able to, then Process what's in it
  Until the time when you failed to fetch a new LineRT when you tried.
Close SourceRT

Ick. And if it isn't "pretty" (simple, clear, elegant- think "Scandinavian" furniture, etc, style), it is likely to be a pain to get up and running for all variations of circumstance.

The answer!

The answer depends on a small bit of "cleverness".

At the start of the whole exercise, we will "pre-fetch" a line before we enter the main "process stuff" loop.

We will use the data in the "pre-fetched" line on the first pass through the loop.

Then, near the end of the loop, we will try to fetch another line. If successful, we'll go back, do the loop. If not, we can just "drop out" of the loop, pass on to the next part of the application.

So... in outline...

Open SourceRT
Prefetch a LineRT
Repeat
  Process most recently fetched LineRT
  Try to fetch another LineRT
  Until there are no more LineRTs to fetch
Close SourceRT

Simple enough? I hope so. But hardly something you could submit as program code.

What do I mean by "Process"? That will depend on what is in the lines, why we are going though them. Perhaps each line is the size, in square miles, of each state in the United States, and we want to know the size of all the USA states taken together? The "processing" would be "extract size, add it to the total you've been building."

Below here, I've done again the outline I presented above, with some additions...

"boMoreToDo" is just a variable, a "boolean variable", i.e. it can hold "true" or "false". (If you aren't comfortable with booleans, think of it as an ordinary variable-for-numbers, and put zero in it to stand for "false" or one to stand for "true".)

(":=" stands, as is usual in Pascal languages, for "becomes". It is the "assignment operator". The equals sign on it's own asks "are the things to left and right of me the same, i.e. things that hold the same thing as what's on the other side?", i.e. "=" is the "comparison operator".)

Anything to the right of a "//" is just a comment, not something to be "done".

Open SourceRT
boMoreToDo:=true //Assume this for a moment.

//More on the next line in a moment...
sToProcess:=a LineRT from the SourceRT//... if you can (pre) fetch one.
     //Make boMoreToDo false if you can't

Repeat
  Process what's in sToProcess// (Most recently fetched LineRT))
  //Try to fetch another LineRT, put it in sToProcess, alter boMoreToDo if unable to fetch
  Until there are no more LineRTs to fetch
Close SourceRT

With just a substitution of a "While [condition] [do]" in place of the earlier "Repeat [actions] Until [a condition is false], that becomes...

Open SourceRT
boMoreToDo:=true //Assume this for a moment.

//More on the next line in a moment...
sToProcess:=a LineRT from the SourceRT//... if you can (pre) fetch one.
     //Make boMoreToDo false if you can't

While boMoreToDo true do begin...
  Process what's in sToProcess// (Most recently fetched LineRT))
  //Try to fetch another LineRT, put it in sToProcess, alter boMoreToDo if unable to fetch
  [[end of "do begin" material]]
Close SourceRT

Can it really be that simple?

Yes! It can, fundamentally, be that simple.

Some details...

The bit...

Try to fetch another LineRT, put it in sToProcess, alter boMoreToDo if unable to fetch

... is rather inelegant in the above.

That was to make it possible to cover two variations with just one "outline plan"

The variations arise because sometimes you know in advance when to stop trying to read another line. Other times you only know that you've tried to read PAST the end of the data in the SourceRT when you attempt a further read-from-source.

Here are the two versions you need to know about so you can use the one that fits the SourceRT (and/or way of accessing it) that applies in the task you are involved with...

//If you have a way to know whether there are more LineRTs in the SourceRT, without having to attempt a read, things are self-evident, I hope?....
If there are no more lines to fetch, set boMoreToFetch to false

Alternatively...

//If you don't have a way to know whether there are more LineRTs in the SourceRT, without having to attempt a read...
Attempt to put another LineRT in sToProcess
If you get an error message saying "You read the last line on your previous read from the SourceRT" then set boMoreToFetch to false

The rest is details!

The rest of processing the LineRTs if a SourceRT, say, taking a text file like...

Alaska, 660,000
Texas, 270,000
California, 160,000
Montana, 140,000
New Mexico, 120,000

(Those numbers are approximate- areas in square miles) ... and getting 1,350,000...

... is mere detail! I've written that up in the next tutorial oin this mini-series my tutorial about adding up numbers from all the LineRTs in a SourceRT. Along the way, I've covered odds and ends like "what if there's an empty line in the SourceRT?" and "Is there a way to allow "comments" in the LineRTs in the SourceRT?"

A more interesting problem arises if you want to process... in one pass... something like...

a,5
a,3
a,7
b,100
b,600
c,20
c,15
c,35

... and get...

a:15
b:700
c:70

By the way... you only "get" one variable to hold the totals that are calculated along the way. That restraint will seem very artificial in this context, but, I assure you, situations arise where a similar constraint exists.

(I picked all the "a"s from 0-9, all the "b"s from 100-999 and all the "c"s from 10-99, just to make it easy to see where the different "categories" began and ended.)

I've written that up in the tutorial about adding numbers from data files, adding them up for groups...the tutorial in this mini-series with follows on from the one that produces a simple total.

A word about the name used for this tutorial's URL... [[to be done!!]] qEdit

A few words from the sponsors...

Please get in touch if you discover flaws in this page. Please mention the page's URL. (wywtk.com/lut/3linesfrmsrc/3linesfs-main.htm).

If you found this of interest, please mention in forums, give it a Facebook "like", Google "Plus", or whatever. If you want more of this stuff, help!? There's not much point in me writing these things, if no one feels they are of any use.



index sitemap
What's New at the Site Advanced search
Search tool (free) provided by FreeFind... whom I've used since 2002. Happy with it, obviously!

Unlike the clever Google search engine, this one merely looks for the words you type, so....
*    Spell them properly.
*    Don't bother with "How do I get rich?" That will merely return pages with "how", "do", "I"....

Please also note that I have three other sites, and that this search will not include them. They have their own search buttons.

My SheepdogSoftware.co.uk site, where you'll find my main homepage. It has links for other areas, such as education, programming, investing.

My SheepdogGuides.com site.

My site at Arunet.




How to email or write this page's editor, Tom Boyd. Please cite page's URL, 3linesfs-main.htm, if you write.


Test for valid HTML Page has been tested for compliance with INDUSTRY (not MS-only) standards, using the free, publicly accessible validator at validator.w3.org. It passes in some important ways, but still needs work to fully meet HTML 5 expectations. (If your browser hides your history, you may have to put the page's URL into the validator by hand. Check what page the validator looked at before becoming alarmed by a "not found" or "wrong doctype".)

AND passes... Test for valid CSS


Why does this page cause a script to run? Because of the Google panels, and the code for the search button. Also, I have my web-traffic monitored for me by eXTReMe tracker. They offer a free tracker. If you want to try one, check out their site. Why do I mention the script? Be sure you know all you need to about spyware.

....... P a g e . . . E n d s .....