Section 2.1. Proprietary Versus Open Source? | Open Sources 2.0: The Continuing Evolution

2.1. Proprietary Versus Open Source?

Before you go any further, throw off any notion that the proprietary developer is somehow a different person from the open source developer. It is uncommon for a member of the open source developer community to do only open source for a living. Only the most prominent, or loaded, members of the open source community come close to having this kind of freedom. It is indeed rare to find a developer who develops only with proprietary tools and libraries. Even Visual C++ and C# developers benefit from a great variety of code and libraries that are free for use in their programs.^[1]

^[1] Traditionally, one difference between open source and proprietary development teams has been that open source teams are, in general, geographically quite dispersed. However, in this age of outsourced, offshored, and distributed development, even proprietary development has become highly dispersed geographically.

My career has focused on open source development for the last 10 years, and I'm constantly pleasantly surprised by how open source development and proprietary resemble each other. I believe this is because proprietary developers are educated by the adventures of their slightly crazy open source cousins, but I also know that open source developers have learned just as much from proprietary developers.

Don't read this as an attempt to muddy the difference between proprietary and open source programs. They are different, sometimes very much so. However, they come from the same people, and they're using a lot of the same methods and tools. It is the licenses and the ideals behind open source programs that make them remarkable, different, and revolutionary.

2.1.1. The Example Culture

A lot of people, when talking about open source software development, say that open source developers enjoy a great productivity gain from code reuse. This is true, but in my experience all developers, not just open source developers, benefit from the existence of free-of-charge standard libraries and code snippets. For decades, proprietary developers have had a great variety of prepackaged libraries to choose from, but these proprietary libraries haven't taken root in the same way that freely usable, open libraries have.^[2]

^[2] This will likely inspire many to cite their favorite commercial library. A full survey of libraries, both commercial and open source, would be required to validate this statement properly. This is an educated assumption on my part, as when commercial libraries manage to gain any sort of prominence, open source developers tend to fill the gap, thus overshadowing the commercial project.

2.1.1.1. Code reuse? Knowledge reuse!

In Linus Torvalds' essay from the first Open Sources, he talked about how the rise of open code was delivering on the promise of reuse touted by proponents of the Java© programming language specifically and object-oriented programming in general.

That said, it has been my experience that there is a point at which software developers will go out of their way to avoid reusing code from other projects. In some shops, they call it "not invented here" (NIH) syndrome, and some companies are famous for it. But even those shops use standard kernels, libraries, and compilers. The real difficulty here is in figuring out where the NIH line lies. Although the answer is different for every single programmer and team, all still can (and still do) learn from the open code out there, which is a unique advantage of open code. While both open and proprietary code can be reused in a wide range of circumstances, open code enables something further: knowledge reuse. By examining the code itself, the developer can learn how a particular problem is solved, and often how that solution is an instance of a general solution type. It is this kind of reuse that Linus applauded and that the NIH developer misses.

Then why not simply use other people's code? There are a number of factors to consider before code is incorporated, and these must be understood before one can understand the role that Free Software has had in development.

2.1.1.2. Speed of development

There are very real barriers to using other people's code. You have to examine how to interface with said code, and you need to review the code to make sure it meets your standards for security, license, style, and correctness. You also need to integrate it into your version control and build system.

None of these problems is insurmountable, but they have to be worth surmounting. To wit: if all I need is a routine to do something simple, such as iterate through an array of numbers and perform some simple operation on them, using someone else's software would be a waste of time.

When developing, I like to use large libraries only when I either don't want to deal with a technology, or I don't fully understand it and don't feel qualified to implement it. For a recent project, I was pulling newsfeeds from weblogs and performing a kind of natural English language processing on it. I thought that using a tool called a "stemmer" to normalize the data would make my later analysis more accurate.

Implementing the routines to download and process feeds could have taken a month or two, and this is exactly the kind of development I don't like to do. To properly implement a stemmer, I'd likely have to get my graduate degree and then write itwhich would impact my deadline a bitso I downloaded programmer-friendly libraries that did each of these tasks. The stemmer was available under the Berkeley Software License, and the feed parser was available under the Python Software License, both of which are very easy to deal with and do not require any onerous post-incorporation duties. I was thus able to save time and have better code.

That said, some things I'm very interested in developing myself. Since I was doing this project as an excuse to learn a natural-language processing algorithm, which was interesting to me, I wanted to write that part of the program myself. I was (and am) also fascinated with a problem I think I'll have in storing the results such that I can quickly retrieve them from a database. I haven't solved that problem as of this writing, but I don't necessarily want to use other people's code for that. I have read some code and examples in textbooks and online that will help me with the former, but the storage problem is mine, for now.

This gives you an idea where the line was for me in this particular project, but others have the same reticence for other, subtler reasons.

2.1.1.3. A particularly difficult codebase

What makes software difficult to add to your code? Sometimes the code is simply in the wrong language. Maybe you are using Perl and want to tie some code into a C or Python module. That's not always so easy. Maybe the code was really developed on only one platformsay, an Intel machineand you want it to work on your iBook, which runs on a PowerPC processor.

The problems with using other people's code can be legion. Maybe their routines were implemented assuming a machine with a lot of memory or processor cache, making it perform poorly or, worse, unpredictably,^[3] on your target platform. Maybe the software was developed for an earlier version of your programming language, so a lot of features you would have implemented with a standard library call are instead implemented from scratch, thus reducing future maintainability.

^[3] This might seem strange, but programmers are OK with the odd performance hit sometimes. Unpredictable results lead to crashed programs, however. This is not good, no matter what you've been told.

Problems arise with canned libraries as they get older. For instance, the aforementioned feed parser library is useful because its author, Mark Pilgrim, is very good at keeping it up to date with the 13 "standards" that lie behind that "xml" button on your favorite blog or web site. If the library were to fall into disuse, or Mark were to stop working on it and no one else picked up the work, I'd likely change to a different library or choose to maintain it myself.

There is another reason to not use someone else's code, and it will look amazingly petty to all but the programmers reading this.

Technically speaking, this:

int myfunction(int a) {     printf("My Function %d\n",a); }

is the same as this:

int myfunction(int a) {     printf("My Function %d\n",a); }

which is the same as this:

int myfunction(int a){    printf("My Function %d\n",a);}

and this:

int myfunction(int a){       printf("My Function %d\n",a); }

They compile to the same result on any given compiler.

I could go on, but I won't. The point is that, depending on the programmer or dictated company style, each of these is wrong, evil, bad, or awful, or perhaps one is acceptable. Not all programmers and companies care about style, but many (one might argue the smartest) do. The ones that do care actively dislike the ones that don't and do not want to use their code. Should they have to touch the offending library, they will inevitably have to make it "readable." Whether you call this refactoring or prettifying or whatever, it can drive a programmer away from a hunk of code, unless it really brings something fantastic along with it.

"My Goodness," you might consider asking, "are programmers delicate, petty creatures?" No, there are some very good reasons to have consistent code style. It aids in debugging. Some say it reduces bugs (I'd agree). It makes code navigation much faster and makes it easier for people to write tools to generate and manipulate code than they might otherwise. There are other reasons too, but I don't want to get too arcane. Some languages, such as Python, have very rigid appearance rules, as appearance can dictate how a variable can be used. Style may appear to be a trivial concern, but it isn't.