Cincom

Web Log Stats Lesson 11
Refactoring


| Table of Contents | Lesson 10 | Summary |


Refactoring is a term you will hear over and over again within the Smalltalk community. It basically means re-writing your code in order to make it more re-usable and efficient. In a broader sense, it can be said that it is simply a process where you "clean up" your code, making improvements wherever possible.

In this, the final lesson of this tutorial, we will refactor our code and tidy up some loose ends. However, unlike the other preceding lessons where each line of code is explained, this lesson will identify what improvements have been made to some of the existing methods and why it was necessary to create some new ones. Even though they will not be explained line-by-line, an overall explanation will be given.

Our refactoring will include the following enhancements and modifications:

·  The method getLogFiles should simply return a list of log files.

·  The display of web hits is in the Transcript while the display of page counts is written out to multiple files. The web hits should also be written out to a file.

·  The limitation that all "stats" files have to be deleted before you can re-run the application should be removed. Change the name of the output files so that the filter of ".log" does not pick them up.

·  There should be just one "start" method that generates both types of statistics.

1. From the main VisualWorks Launcher window, click the fourth button on the Toolbar or select the menu option Browse >> System.

2. In the package pane (furthest left) scroll down and highlight (select) the WebLogStats Package.

If you do not see the WebLogStats package, then you will need to file it in. This lesson requires that the work you did in the previous lessons is loaded into the VisualWorks development environment.

3. Make the following changes to the getLogFiles method.

getLogFiles

| workingDir contents xFound logFiles |
logFiles := Set new.

workingDir := logDirectory asFilename.
contents := workingDir directoryContents.
contents do: [ :each |
xFound := each findString: filter startingAt: 1.
xFound = 1
ifTrue: [ logFiles add: each. ] ].
^logFiles.

4. <Operate-Click> and select Accept.

The reason for the change is quite simple. If you compare the original getLogFiles method and the getLogFilesForPageCounts, they basically do the same thing. The only difference is that one is used for counting hits and the other is used for counting page counts. This is redundant code and that's not good. Why not have just one routine that did exactly what the method said it would do - getLogFiles? So that's what the above code does. It gathers all files in a directory according to some filter and returns them in a Set. This makes the method re-usable. No method should know too much about other methods.

The concept of re-usability cannot be overstated. Here's why. Suppose that we wanted to write another routine that searched the log files for how long people stay at the site. If we kept things as they were, we would have cloned either the getLogFiles method or the getLogFilesForPageCounts method and changed the line that referenced either showHits or showPageCounts and replaced it with something like getDurationTimes. Now comes the rub. Suppose (for sake of argument) that the webmaster decided to keep the log files not in just one directory, but in a multitude of directories. This would force you to change 3 methods. However, coding the method the way we did above, we would only have to change one method.

One final note: we are changing the xFound test from "greater than zero" to “equal to zero” to get around the problem of analyzing a file that may match our filter but is not a log file. The test now states that we must find the characters ws00 at position 1. Later on, in another method, we will use the file name itself (which will contain the characters ws00) as part of the name of the file that contains the page counts. Therefore, the "greater than zero" test would return true which we don't want.


5. Time to create a new method. Replace the text in the method code area with the following:

getRootName: aFile
| line |
line := aFile copyUpTo: $..
^line.

6. <Operate-Click> and select Accept.

The name of a log file (say ws000101.log) will be passed to this method. Since it is actually a String, we want to keep the base part of the file name and remove the ".log" extension and return another string of characters up to the dot. This will allow us to easily append some text to create meaningful output file names that don't have the ".log" in them.

7. Select the showPageCounts: method and make the following changes:

showPageCounts: aFile

| stream line bag xFound sort |
bag := Bag new.
stream := (logDirectory asFilename construct: aFile) readStream.
[ stream atEnd ] whileFalse: [
line := stream upTo: Character cr.
line := line copyFrom: 50 to: line size.
line := line copyFrom: (line indexOf: $/) to: line size.
line := line copyUpTo: $,.
xFound := (line findString: '.asp' startingAt: 1).
xFound > 0
ifTrue:[ bag add: line. ]. ].
stream close.
sort := SortedCollection sortBlock: [:a :b| a >= b].
bag valuesAndCountsDo: [ :each :count |
sort add: (count -> each)].
^sort

8. <Operate-Click> and select Accept.

We are removing the code that creates the output file and replacing it with the line that simply returns the sorted collection. Another method will output the collection to an external file.

9. Time for another new method. Enter the following text:

startHits

| logFiles myHits count out outFile|
myHits := SortedCollection new.
logFiles := self getLogFiles.
logFiles do: [ :each |
outFile := 'websitehits.txt'.
count := self showHits: each.
myHits add: (count -> each). ].
out := (logDirectory asFilename construct: outFile) writeStream.
myHits do: [ :each | out cr; nextPutAll: (each printString).].
out close.
Dialog warn: 'Site hits complete'.

10. <Operate-Click> and select Accept.

We create a new collection called myHits which will contain an Association. The association will be between the log file and the number of hits that log file got (count). Once all the log files have been processed, the collection will be ready to be output to an external file.

11. Select the startPageCount method. Since the change to this method is so extensive, it's best that you make the method look like the one below (copy and paste).

startPageCount

| logFiles count rootName out |
logFiles := self getLogFiles.
logFiles do:
[ :each |
rootName := 'pagecounts_', (self getRootName: each) , '.txt'.
count := self showPageCounts: each.
out := (logDirectory asFilename construct: rootName) writeStream.
1 to: 10 do: [ :xx | out cr; nextPutAll: (count at: xx) printString.].
out close.
].
Dialog warn: 'Page Counts Complete'.

12. <Operate-Click> and select Accept.

For every log file, we strip off the ".log", replace it with ".txt" and prefix it with "pagecounts_". This will create unique file names that won't have our "filter" characters in them. We then call the method that gets our page counts which returns a sorted collection and we store that into count. We then output the first 10 elements of the collection to an external file.

13. Modify the start method as follows:

start

filter := (Dialog request: 'Please enter a filter '
initialAnswer: 'ws00').
(filter size) > 0
ifTrue: [ self startHits.
self startPageCount.
Dialog warn: 'All Statistics Done'. ]
ifFalse: [ Dialog warn: 'No Problem. Have a nice day'].

14. <Operate-Click> and select Accept.

This method will ask us for our filter. If we enter one, we will then invoke the 2 methods that gather our statistics. Note that this code to prompt us for the filter used to be in 2 methods - now it is in one. This is a good thing. For example, suppose our illustrious webmaster decides to change the file extension on log files from ".log" to ".wlg". We would have had to change the prompt in 2 methods instead of one.

Provided we entered a filter and did not hit Cancel, both our statistics methods will get invoked. Otherwise, bid them goodbye. This is the only thing new about our code. It's an example of a classic "If-Then-Else" statement.


15. Make sure to delete all ".stats" files in the VisualWorks default directory. If any ".stat" file remains, an error will occur. This will be the last time you will have to remember to do this step since we fixed this problem in the startPageCount method.

16. Enter the following into a workspace:

WebLog new start

Highlight all of this text, <Operate-Click> and select Do it.

The code will now generate statistics files for both web hits and page counts. A dialog box informs you of its progress along the way.

17. Open the File Browser dialog box and navigate to the directory where the log files are. You can view them from here.

18. Now would be a good time to re-save your work. Refer to Lesson 8 if you forgot how.

Summary

Refactoring has improved your code by making it more reusable. One of the major accomplishments was the modification to the getLogFiles method. By making some minor changes to a very specific task, we were able to re-use it in two other methods. As stated before, should we write the code to determine how long people stay on the site, we would need to pull in all log files, and thanks to our refactoring work, we would be able to use (re-use) our getLogFiles method for that.

By the way, refactoring is really a never-ending job. Someone else might look at your code and realize that some of your methods could be written more efficiently. For example, take the getRootName: method below.

getRootName: aFile

| line |
line := aFile copyUpTo: $..
^line.

This code could be re-written as follows:

getRootName: aFile

^aFile copyUpTo: $..

Not that this was a major change (the declaration of the temporary variable was not necessary, therefore less memory was used in this method than its predecessor) but if you have enough of them, they do add up. Then again, this is the nature or "culture" of Smalltalk - writing very short methods.

This concludes the web server log analysis tutorial.

You now should know how to:

Determine ways to make your code more efficient and re-usable

Code a classic "If-Then-Else" statement.

Identify how to pass data from one method to another.


| Table of Contents | Lesson 10 | Summary |