|

|
The previous workshop verified that you could
successfully locate a web server log file on your machine and that you were
able to view it in the VisualWorks File Editor. This brought the file
in "all at once", but in order to count the "hits", we
need to be able to read in the file line by line.
|
|

|
In this lesson, you will learn how to
perform loops, a process where you repeat something until you determine it's
time to quit. Specifically, the steps we will repeat are reading in the file
line by line and telling it to quit when we have reached the end of the file.
Along the way, we will also extract the IP address from each line.
|
|

|
1. In your Workspace, modify
the text as follows, highlight all of it, <Operate-Click> and
then select Do it.
| myFile myStream |
myFile := 'ws000101.log' asFilename.
myStream := myFile readStream.
myStream inspect.
myStream close.
A new Inspector window will appear. The
caption on the window reads an ExternalReadStream with self and
eleven other properties appearing in the left hand pane. The important part
is that VisualWorks treats sequential files as a stream of characters.
See Figure 4-1.

Figure 4-1. The log file as a Stream in the VisualWorks Inspector
Window
|
|

|
Just a reminder. You will have to supply the "fully
qualified path" for the location of this file if you did not place it in
the VisualWorks "default directory". Remember, from this point
forward, when you are asked to reference this file or any other file, the
example code will simply use the "non-qualified" file name. It will
be up to you to make sure you are correctly referencing this file. Remember,
the benefit of placing this file (and all other log files) in the VisualWorks
"default directory" is that you will be able to copy and paste the
code snippets from this lesson into your Workspace.
|
|

|
2. In the left pane of the Inspector window, click
(highlight) the word lineEndCharacter.
Note that
in the right pane you will see the value of lineEndCharacter which is Core.Character
cr. For Unix/Linux, it will display as Core.Character lf. Keep
this in mind as we will use this information later on in this lesson.
3. Close the Inspector window.
|
|

|
A lot is going on here so let's dissect this step
by step.
On the first line, we are declaring not one but 2 temporary
variables. This is done by placing the vertical bar first, listing the
variable names separated with a space and then finally ending the list with
the vertical bar character.
myStream := myFile readStream.
This is where things get interesting. On assignment statements
(those that contain the := symbol), Smalltalk
evaluates the right side first and then assigns the result to what's on the
left side. So the myFile object is being sent the message readStream.
This means that we wish to treat the log file as a "stream" of
characters (after all, that's all a file really is to a computer) and assign
that stream to the temporary variable myStream.
We tell Smalltalk to inspect the stream (i.e. this is just like
the <Operate> menu option of Inspect It but this is how
you can invoke it programmatically (Cool !!).
And while you were looking at the Inspector window,
Smalltalk went ahead and executed the last line which physically closed the
file (always a good idea to close the file when you are done with it).
The advantage of using a stream is that by determining
the lineEndCharacter, we can read through a file line by line. So
let's try to read in just the first line of the file.
|
|

|
4. Modify the text accordingly,
highlight all of it, <Operate-Click> and select Do It.
For UNIX/Linux users, you will have to change the cr
to lf .
| myFile myStream myLine |
myFile := 'ws000101.log' asFilename.
myStream := myFile readStream.
myLine := myStream upTo: Character cr.
myLine inspect.
myStream close.
Note that
a new (Inspector) window will appear with the caption of the window
being a MSCP1252String (MSCP1252 is a Windows character set
designation). Displayed in the Inspector window is the first line of
our file. See Figure 4-2.

Figure 4-2. The first line of our log file
|
|

|
If the above code does not produce a "single
line" (i.e. it returns the entire contents of the file on one line),
then the operating system you are using might need a little more help in
determining how the "end of line" character is interpreted.
Since we know that this file has "carriage return-line feeds" at
the end of each line, then we can tell VisualWorks to expect them when we
open the file. The way to do that would be as follows:
myStream := myFile readStream
lineEndCRLF.
We are getting a little ahead of ourselves as far as syntax goes (note that
this expression is not your typical object method syntax) but this
wiil be explained shortly when you see the link to the syntax primer.
At this point, it would suffice to say that the above line is two lines of
code combined into one. Smalltalk expressions are evaluated left-to-right so myFile
readStream is evaluated first (returning an instance of the Stream
class) and that object is then sent the message lineEndCRLF. The
result is that VisualWorks now knows that when it encounters "carriage
return-line feed" characters, it will know that it is at the end of a
single line within the file.
|
|

|
5. In the Inspector window, click the Basic
tab.
In the
left-hand pane is self followed by a sequence of numbers. The numbers
run from 1 through 122. This indicates that the first line of our file
contains 122 characters (one number for each character).

Figure 4-3. The first line of the log file contains 122 characters
|
|

|
We need to explain the 4th line from the code
above. It does a lot of work.
myLine := myStream upTo: Character cr.
On the third line, when we issued the readStream message
to the myFile object, Smalltalk converted a Filename object
into a Stream object. The entire contents of the file is now in the
form of a stream of characters. Think of this stream of characters as stored
in the temporary variable of myStream.
Now on to the fourth line where we issue an upTo: message to the Stream
object of myStream. Note there is a colon in this message meaning that
what follows the colon is a parameter. In plain English, this line could be
translated as "take all the characters of the myStream Stream up
to the Character cr and assign that to myLine. It's kind of
like a copy statement. The upTo: message just wants to know
"where do I stop?". The character cr stands for Carriage
Return (hence the cr) and in most line-by-line files such as this,
a Carriage Return always marks the end of one line and the beginning
of the next.
Note that the 4th line is a bit more complicated than 4
squared or 3 + 4. Not only is this a statement that contains a
parameter, but it involves an object (myStream), a class (Character)
and 2 messages (upTo: and cr). To understand how Smalltalk
interprets this line (i.e. which method gets evaluated first), refer to the
syntax primer, which also contains information about a widely-accepted naming
convention used in Smalltalk.
We have read the first line of the file. We now need to
extract the IP address from this line. Because this is a "comma
delimited" file, all "fields" in this file are separated by a
comma. Because the IP address is always the first field on each line, we
simply need to read in all the characters of a line "up to" the
first comma and that will be an IP address. Let's try it.
|
|

|
Proceed to the syntax primer.
|
|

|
6. Modify the code accordingly,
highlight all of it, <Operate-Click> and then select Do it.
| myFile myStream myLine
addrIP |
myFile := 'ws000101.log' asFilename.
myStream := myFile readStream.
myLine := myStream upTo: Character cr.
addrIP := myLine copyUpTo: $,.
addrIP inspect.
myStream close.
Note that a new (Inspector) window will
appear and the caption of the window is a MSCP1252String. Here you
will see the value of the a MSCP1252String which is an IP address
(specifically 209.67.247.201). Clicking the Basic tab, you will
see the numbers 1-14 indicating that the IP address contains 14 characters
(one number for each character).

Figure 4-4 A successful extract of an IP address
7. Close the Inspector window.
|
|

|
We need to explain the 5th line from the code
above.
addrIP := myLine copyUpTo: $,.
Now that the temporary variable myLine contains the first
line of characters, we need to take all the characters up to the comma and
store that in the temporary variable of addrIP. So we gave the String
object of myLine (remember, this is an instance of a ByteString) the
message copyUpTo: and passed it the parameter of $,. Since we
are looking specifically for the comma character, we use the dollar sign in
front of the comma to tell Smalltalk that we want the comma character. For
Smalltalk, the comma has a special meaning (actually, it’s a message -
surprise, surprise!) so by using the dollar sign we are asking to just look
for the comma character rather than treating it as a message.
Note that we have used 2 different messages that
basically did the same thing. The upTo: message copied stuff "up
to" some parameter. The copyUpTo: message copied stuff "up
to" some parameter. The reason why we used 2 different messages is
because we had 2 different objects. The upTo: message is used on Stream
objects whereas the copyUpTo: message is used on String
objects.
At this point, you might be saying to yourself, "Wouldn't it be nice if
there was just one message that 'copied stuff up to some parameter'
regardless of which class I was using?". The answer is yes, and this
concept is referred to as polymorphism, a fundamental tenet of
object-oriented principles. For now, you just have to remember to use the upTo:
message on Stream objects and the copyUpTo: message on String
objects.
|
|

|
Proceed to the polymorphism
primer.
OK. We have an IP address. But each line contains an IP address. We need a
way to loop through the file line by line and collect the IP address as we
go. Would you believe that just one additional line of code will do that?
|
|

|
8. Modify the code accordingly,
highlight all of it, <Operate-Click> and then select Do it.
| myFile myStream myLine
addrIP |
myFile := 'ws000101.log' asFilename.
myStream := myFile readStream.
[ myStream atEnd ] whileFalse: [
myLine := myStream upTo: Character cr.
addrIP := myLine copyUpTo: $,.
Transcript show: addrIP.].
myStream close.
Note that a whole bunch of text (numbers) has
appeared in the white area under the main VisualWorks window. As you recall
from a previous lesson, this is called the System Transcript and it is
one of the 4 basic ways of displaying output in Smalltalk (Lesson 2).

Figure 4-5. All IP addresses displayed in the Transcript
|
|

|
You guessed it! Another line of code that does
quite a bit.
This is basically the same code as before. Since we already had code to read
in a line and extract the IP address, we just needed a way to repeat that
process from the beginning of the file to the end of the file. And aside from
the Transcript statement that displays our IP address, all it took to
perform this loop was just 1 line of code !!!
[ myStream atEnd ] whileFalse: [
Since just one line of code did so much for
us, it will take quite a bit of explanation. So as not to distract from the
flow of this workshop, you may choose to view the primer on loops.
|
|

|
Proceed to the loops primer.
|
|

|
Summary
Most of
the explanation of the topics and concepts regarding Smalltalk for this
exercise already took place. However, now would be a good time to recall how
we are approaching this exercise. Only one more step remains before we finish
the exercise - that of counting the number of hits this web site had for a
particular day. We will do that in the next lesson, but let's recap how we got
to this point.
We first had to know how to open a file. Then we had to know how to read it
in line by line. We then had to know how to extract data from that line.
Instead of doing this (writing a routine to display all IP addresses in a web
server log file to the System Transcript) all at once, we did it in a
"test this snippet of code - it worked - move on" fashion. In
another programming language, you could not do this. It would involve writing
all the code in some editor, compiling it and then running it. The first time
you run this code, you really have no idea if it would work. Chances are it
would not have run successfully so you go back and (try to) fix the problem
(in your editor), compile it and run it again. You would repeat this process
over and over again until eventually you get the program to work. It's an
"all or nothing" approach.
With Smalltalk, it is incremental. You test and play with little chunks of
code, get those working and then piece them together as a single unit. This
is what makes code development in Smalltalk so much more productive. Once you
get those little chunks of code working, you never have to go back and test
them again.
In the next workshop, now that we know how to extract all those IP addresses
from the file, we will be able to collect, sort and count them.
You now should know how to:
|
Read
in a file line by line
|
|
Extract
data from a string of characters
|
|
Perform
a loop
|
|
Code
block statements
|
|
Overrides
the special meaning of certain characters (comma)
|
|
Use
the Stream class for file access
|
|
Recognize
Boolean expressions
|
|
|
|