Cincom

Web Log Stats Lesson 6
Reading a Directory


| Table of Contents | Lesson 5 | Lesson 7 |


We've counted the number of unique hits for a particular log file. However, what if we wanted to do this for more than just one file? After all, each day, the web server generates a new file. Why not scan the entire directory (where all these log files exist) and generate statistics for an entire week, month or year?

This lesson explains how to view the contents of a directory. Then, we’ll see how our existing code can be extended to count web hits for all the log files in a directory.

Up to this point, our code has been fairly readable. Short, concise statements with just two blocks (one conditional block and one grouping block). By the time this exercise is finished, the code will start to become less tidy and therefore less readable. This is done on purpose, to convince you that there is a better, more efficient way to write the code. We’ll re-write our code in the next lesson, but for now, we want to make sure we have good working code.

1. Open a new Workspace and enter the text below, highlight all of it, <Operate-Click> and select Inspect It.

'c:\vw7.4' asFilename directoryContents.

Note that this is a Windows-specific example. If you are using a different operating system, please substitute a directory on your system for the c:\vw7.4 directory. Optimally, use the VisualWorks "default directory" or "home directory."

2. Click the Basic tab in the Inspector window and select the word self.

Note that in the right-hand pane you will see the value of self which is the contents of your directory. Smalltalk will store the file names (and directory names) as an Array.


Figure 6-1. A directory stored as an Array

We just told Smalltalk to treat the string of characters (within the single quotes) as a Filename. It did that, returning a Filename object. We then sent that Filename object a message to retrieve the directory contents of that Filename object (since it wasn't a file per se but a directory). Those people who are familiar with Unix will have no problem with this since a directory (file system) and a file are treated alike. It's no different here.

The contents are now in an Array, another type of Smalltalk collection. Since we are only interested in specific files names, ones that start with ws00 and end with .log, we will need some way of

  1. iterating through our array and
  2. filtering out those file names that don't match our pattern.

Once we do that, we can then use our code that counts hits against those files.

3. Modify the text as follows, highlight all of it, <Operate-Click> and then select Do it. Again, substitute c:\vw7.4 with the directory where you placed the web server log files.

| contents |
Transcript clear.
contents := 'c:\vw7.4' asFilename directoryContents.
contents do: [ :each | Transcript show: each.].

The temporary variable contents stores the array of files in our directory. The way we iterate through the array is with a do block. The do block has 2 parts. The first part identifies a temporary variable (in this case each). The second part is simply a group of Smalltalk statements. The 2 parts are separated by a vertical bar character.

Note that the declaration of the temporary variable each is preceded with a colon. The second part of the do block should look familiar to you - the Transcript show: statement. What we are showing in the Transcript is the value of our do block temporary variable each. So what happens is every element of the contents array is passed to the do block temporary variable each. The Transcript show: statement is then repeated for every element of the contents array. It's like saying "Take each element of the contents array and show it in the Transcript."



Figure 6-2. A directory contents displayed in the Transcript

4. Modify the text as follows, highlight all of it, <Operate-Click> and then select Do it. This should make the display in the Transcript look a little bit better.

| contents |
Transcript clear.
contents := 'c:\vw7.4' asFilename directoryContents.
contents do: [ :each |
Transcript show: each.
Transcript cr.].

The 2nd part of the do block now contains 2 lines instead of one. We wanted to insert a carriage return after showing each file in the Transcript so it is more legible. The cr is a message to the Transcript (object) to insert a carriage return in the Transcript. This is what makes the file names appear on separate lines.


Figure 6-3. A better display in the Transcript of our directory

5. Modify the text as follows, highlight all of it, <Operate-Click> and then select Do it. This will do the exact same thing as the code above but it shows a "shortcut" way of writing code when you repeat messages against the same object.

| contents |
Transcript clear.
contents := 'c:\vw7.4' asFilename directoryContents.
contents do: [ :each | Transcript show: each; cr.].

This is an example of cascading. Note the semi-colon after each. This tells Smalltalk it is the end of a statement (like a period) but there is more to come. And what comes after the semi-colon is another message to the same object (in our case Transcript) used in the previous statement.

6. Now that we can iterate through our list of file names, we want to separate out our "log" files. Modify the text as follows, highlight all of it, <Operate-Click> and then select Do it.

| contents xFound |
Transcript clear.
contents := 'c:\vw7.4' asFilename directoryContents.
contents do: [ :each |
xFound := each findString: '.log' startingAt: 1.
xFound > 0
ifTrue:[ Transcript show: each; cr.].].

If you did not see any log files and you know they are really there, you may have to change the lowercase .log to uppercase .LOG. Remember, Smalltalk is case sensitive. Also, since we have a block of code within another block of code in this example, it might look nicer (easier to read) like this:

 
    | contents xFound | 
    Transcript clear.
    contents := 'c:\vw7.4' asFilename directoryContents. 
    contents do: [ :each | 
                   xFound := each findString: '.log' startingAt: 1. 
                   xFound > 0 
                   ifTrue:[ Transcript show: each; cr.].
                 ].
    



Figure 6-4. All "log" files displayed in the Transcript

We're close. Although we got all of our log files, we also pulled in other .log files that we don't want. Therefore, we will have to further refine our filtering logic.

7. We will now find all log files that begin with ws00 in addition to those ending in .log by adding another "string test". Modify the text as follows, highlight all of it, <Operate-Click> and then select Do it.

| contents sFound eFound |
Transcript clear.
contents := 'c:\vw7.4' asFilename directoryContents.
contents do: [ :each |
sFound := each findString: 'ws00' startingAt: 1.
eFound := each findString: '.log' startingAt: 6.
(sFound > 0) & (eFound > 0)
ifTrue:[ Transcript show: each; cr.].].



Figure 6-5. All "ws00*.log" files displayed in the Transcript

Here is an explanation of the additional lines of code we added.

sFound := each findString: 'ws00' startingAt: 1.

Nothing really new here. Since our log files begin with ws00, then we try to find this string of characters at the beginning of the string.

eFound := each findString: '.log' startingAt: 6.

Since our log files end with .log, then we try to find this string of characters at the end of the string. We could have started at 7 or 8 if we wanted to since the characters .log don't start until the 9th character.

(sFound > 0) & (eFound > 0)

This is a compound boolean statement. The ampersand character (&) is used to AND two boolean statements together. Therefore, this statement will only be true if the log file starts with ws00 AND ends with .log (exactly what we want). This will get rid of the those other 2 log files.

8. Now comes the fun part. If you have not closed out VisualWorks and you have been following the exercises all along, you should have 2 Workspaces open. In the first Workspace, you have code that determines the number of hits for a given log file. In the second Workspace, you have code that locates all "log" files in a given directory. It's now time to combine them.

In our second Workspace, when we find a log file, we just display it in the Transcript. All we have to do is replace that code with the code that determines web hits. It should look something like below. For readability's sake, indentation has been used.

After entering this code, highlight all of it, <Operate-Click> and then select Do it.

 
    | myFile myStream myLine addrIP mySet contents sFound eFound logDirectory | 
    Transcript clear.
    logDirectory := 'c:\vw7.4'. 
    contents := logDirectory asFilename directoryContents. 
    contents do: 
             [ :each | 
               sFound := each findString: 'ws00' startingAt: 1.
               eFound := each findString: '.log' startingAt: 7.
               (sFound > 0) & (eFound > 0)
               ifTrue:
               [ mySet := Set new. 
                 myFile := logDirectory asFilename construct: each.
                 myStream := myFile readStream. 
                 [ myStream atEnd ] whileFalse: 
                   [ myLine := myStream upTo: Character cr. 
                     addrIP := myLine copyUpTo: $,. 
                     mySet add: addrIP.
                   ]. 
                 myStream close.
                 Transcript show: each, '...', mySet size printString; cr.
               ].
             ].
    



Figure 6-6. Web hits for all log files

Indeed, there is a lot going on here. And some new stuff was thrown in as well. Let's look at all this code line by line and make sure you understand what each line does.

| myFile myStream myLine addrIP mySet contents sFound eFound logDirectory |

This line contains the list of all the temporary variables used. Each variable is separated by a space and the whole list is delimited with the vertical bar character.

Transcript clear.

This is a unary message that simply clears the contents of the Transcript.

logDirectory := 'c:\vw7.4'.

This assigns the location of the server log files to a temporary variable. By its name, it is designed to be the name of a folder, file system or directory on your system or network.

contents := logDirectory asFilename directoryContents.

This line collects the contents of your directory and stores the names in an array. The logDirectory temporary variable contains just a string of characters so we send it the asFilename message to convert it to a Filename. Once we have a filename, the directoryContents message does all the work to retrieve all file names for that directory.

contents do:

The beginning of a loop. We are about to iterate through the contents of the contents temporary variable (which just happens to be a list of all files for a given directory).

[ :each |

The beginning of our first block of code. The first part of the block is the declaration area where temporary variables are declared. In our case, :each is the temporary variable we will use to hold the value of a file name (each member of the contents array). The area of declaration ends with a vertical bar character.

sFound := each findString: 'ws00' startingAt: 1.

Each element of the contents array will be a string. Instances of class String understand the message findString:startingAt:. As the name suggests, it tries to locate a sequence of characters within another sequence of characters, starting at a certain position. So, given a file name (like readme.txt), we are trying to find the letters (or sequence of characters) "ws00" starting with the 1st letter of the file name. The result of this method is a number - the position at which the sequence of characters "ws00" begins (in our case, one) - and we store that value in a temporary variable called sFound. If the return value is zero, it means that the system could not find our sequence of characters so that particular file was not a "ws00" file.

eFound := each findString: '.log' startingAt: 7.

Just like the previous line of code, we are trying to locate a sequence of characters within another sequence of characters, starting at a certain position. So, we are trying to find the letters (or sequence of characters) ".log" starting with the 7th letter of the file name. The result of this method is a number - the position at which the sequence of characters ".log" begins (in our case, nine) - and we store that value in a temporary variable called eFound. If the return value is zero, it means that the system could not find our sequence of characters so that particular file was not a ".log" file.

ifTrue:

The beginning of our next block of code. If the return value from our sFound > 0 method is True AND the return value from our eFound > 0 method is True, then Smalltalk will process the block of code following this message. If the return value from our sFound > 0 method is False OR the return value from our eFound > 0 method is False, then Smalltalk will skip the entire block of code following this message.

[ mySet := Set new.

The first line in our second block of code. Since we have found a log file, we can now start the lines of code which counted unique IP addresses. We start it all off by creating a new Set.

myFile := logDirectory asFilename construct: each.

The system must be told specifically where these files are located. Remember, the contents of the contents temporary variable is simply a list of file names. Even though they came from a given directory, Smalltalk doesn't remember that. So we have to construct the filename object. Using asFilename creates a Filename object, and then construct: is a platform neutral way to append the filename itself to the directory - that way, we don't need to hardcode the platform specific separator (\, /, or : depending on the platform). The whole result of this is assigned to the temporary variable myFile.

myStream := myFile readStream.

This statement converts the file to a stream and the result of the conversion is assigned to the temporary variable myStream.

[ myStream atEnd ] whileFalse:

The beginning of our third (and final) block of code. The first part of the block is the Boolean expression. It tests whether we are at the end of the stream (i.e. end of the file). It will return either True (we have reached the end of the stream) or False (we haven't reached the end yet). This result is then sent the message whileFalse:. As long as we have not reached the end of the stream, Smalltalk will process the block of code following this message. If we have reached the end of the stream, then Smalltalk will skip the entire block of code following this message.

[ myLine := myStream upTo: Character cr.

The first statement of our third block of code. We determined earlier that the Character cr marked the end of a line for our log files. This line will take all characters up to that character and assign them to the temporary variable myLine.

addrIP := myLine copyUpTo: $,.

The temporary variable myLine contains a string of characters. Since the IP address of the log file is the first entry in a list of comma-delimited fields, we want to copy all characters up to the first comma. Because the comma has a special meaning in Smalltalk (hint: see an earlier statement above), it is overridden by the dollar ($) sign, telling Smalltalk to treat the comma as a comma character.

].

The end to our third block of code (the whileFalse: condition of being at the end of a file).

myStream close.

By the time we reach this statement, we have reached the end of the file. Here we close the file by closing the stream. Remember, we converted the file to a stream so we have to close the stream.

Transcript show: each, '...', mySet size printString; cr.

Wow. That's looks like a lot of code for one line but it really isn't. By the time we get to this line, we know how many hits we had for a given file. We simply want to display that information in the Transcript. That explains the Transcript show:. The each will contain the file name. We append the 3 dots to the file name as a separator string using the comma, then append the number of hits to this using mySet size printString. Remember, the show: method doesn't like numbers - only strings. The expression mySet size will return a number so we convert it to a string using the printString method. Finally, we cascade the Transcript object (the semi-colon) with a cr method so that the next time we have to display web hits for a file, it will be on a new line.

].

The end to our second block of code (the ifTrue: condition of whether or not we found a log file).

].

The end to our first block of code (the do: iteration of the contents array).

Summary

In the next workshop, we will divide up this code into smaller chunks.

You now should know how to:

Cascade Smalltalk statements

Find a string within another string

Concatenate boolean statements using &

Concatenate strings together

Collect the contents of a directory

Iterate through a Collection of items using a do: block


| Table of Contents | Lesson 5 | Lesson 7 |