Why duplication is bad
On the final exam of a software engineering course, I asked the question "Why is duplication (also named copy-and-paste programming) bad?"
Some people said that it was bad because it tended to introduce errors. I suppose it can if you copy code you don't understand. However, I usually copy working code from a program, often code I wrote myself. This code tends to have less errors than code I wrote myself.
Someone said that it tends to violate local coding standards. This is true if you copy it from a book, but not if you copy it from another part of the program.
Copy-and-paste is clearly bad if you don't know what you are doing. But ALL programming is bad if you don't know what you are doing.
Copy-and-paste is bad even if you know what you are doing because a program with a lot of duplication is hard to change. You might need to change every copy of a code fragment, and you won't even know where they are. Copy-and-paste is not bad for a quick hack, and in fact XP programmers often use it to get a test working. If your new test is similar to your old test, perhaps the new code will be similar to the old. Copy the old code and hack on it until it works. The result will be a lot of duplicate code, but the XP programmer will then refactor it to remove duplication.
So, copy-and-paste is not necessarily bad in the short run, if you are copying good code. But it is always bad in the long run.
Comments
Re: Why duplication is bad
[ Socinian] August 27, 2004 12:05:12.654
Comment on Why duplication is bad by Socinian
Three more points.
Hammering on the basics
[David Anderson] August 30, 2004 11:51:08.730
Isn't it a sad reflection on the state of our profession that it is necessary to hammer on these basic points again and again? I can accept that students might not know better but the point you make here is one that often seasoned developers forget.
It's amazing how doing just a few little things right makes software engineering so much better - and the agile community think they invented this ;-)
Re: [Ralph Johnson - Blog] Why duplication is bad
[ Travis Griggs] August 30, 2004 12:19:57.075
Comment on Ralph Johnson - Blog Why duplication is bad by Travis Griggs
Is there a difference between duplication and duplicating? Though they share the same root, I have seen these two words mean quite different scenarios. I have seen cases where two different implementations started out as different things, but over time, drifted till they were basically a duplication of one another. No common parent at all. But the end effect is there is a duplication.
On the flip though, I have seen cases where people "duplicated" a class/method/subsystem and then evolved it till it was quite different, or maybe only subtly different, but nevertheless different.
If we consider the "biological" metaphor of Smalltalk programming, I would think that actually duplicating would be something we as programmers should be very comfortable with.
Workplace pitfills
[Jeff Clausius] August 30, 2004 14:35:29.488
Prof Johnson:
A piece of advice from someone in the trenches. If you end up working in a place where the phrase, "Oops, copy and paste error. My bad... Hang on, I'll fix it" is commonplace, you have two options: 1) Start educating about the dangers of the practice, and reduce the risk OR 2) Start looking for a new job if it does not change.
In my earlier days, just out of UIUC, two of the developers I worked with would occasionaly introducing these kind of errors. In a field where time is of the essence, these things tended to slow down the entire process, introduce bugs where none existed, and basically frustrate everyone involved.
Jeff Clausius
Re: Why duplication is bad
[ Rich Demers] August 30, 2004 17:30:13.986
Comment on Why duplication is bad by Rich Demers
I can't disagree with the views that have been expressed, but maybe "copy and paste" coding has a (limited) place.
Let's say you are designing a domain model for a complex domain. If you are a reasonably proficient Smalltalker, you have a whole bag of tricks and "best practices" you can use to code something "once and only once": inheritance, polymorphism, and patterns like Visitor and Policy. But you still have to make choices. You have to consider a variety of objectives beyond just "once and only once," and you have to make these choices in the face of incomplete knowledge of the properties and behaviors of the entities in the domain.
Designs evolve over time, but often your initial choices become difficult to change. So maybe you initially set up an inheritance hierarchy based on common properties or common behaviors or common patterns of communications. By emphasizing one aspect, you automatically set yourself up for cases where the other aspects don't factor neatly into your chosen hierarchy. It's tempting to say "if only we had multiple inheritance," but two decades of experience says it isn't worth the confusion it causes. So what are your choices?
This is where the Visitor and Policy patterns can be applied, but only if the size of the problem justifies them, where a lot of behavior or a large number of properties must be applied to a significant distribution of entities in various limbs of the hierarchy. To use them for small problems just adds conceptual complexity to the code; complexity that hinders its readability and maintainability.
And this is where "copy and paste" can be justified, at least until the size of the problem justifies something else. Of course, this assumes a style and culture of programming that encourages continual refactoring and redesign as new knowledge and requirements become known.
Re: Why duplication is bad (copy & paste
[Terry] August 30, 2004 20:11:54.203
There is one other place where copy & paste is the right way to go.
If you are creating another feature/app that you think is similar to an existing one, I have found that sometimes it is better to copy the existing feature/app and then change the copy to produce the new feature/app. After I have programmed most of the requirements and get it working, then I go back and compare the two designs and recombine them. I find that when things are complex that it is easier to sort out the issues this way than trying to refactor and reuse the existing design while making the new app.