The purpose of version control is twofold. The first is to avoid the version conflicts in development that can arise when multiple programmers work on the same set of files on a project. The second is to automatically journal changes to key files on a project.
By requiring programmers to check out a file from a central repository when they wish to work on it, a version control platform can keep a record of who is working on which files and at what times. Depending on the approach taken, this method can either wholly exclude other developers from working on that file at the same time, or allow other developers to also work on the file and then merge their changes together when they both check the file back in again.
Journaling of files in a project involves the retention of previous versions of each file. Whenever a file is checked in to a repository, the version control platform marks it as current so that anybody retrieving that file from the repository does retrieve that latest version by default. However, a copy is taken of the file in its previous state. Not only does this allow the project administrator or lead architect to roll back a file should a new version prove problematic, it also allows the changes between versions to be easily listed, which, from a project management perspective, is a major boon. All version control systems use some variety of repository for storing a copy (usually verbatim on disk, but sometimes as part of a more complex and often proprietary database format) of the file and directory structure of the project. How this repository is accessed varies from system to system. We examine a typical topology later in this appendix.
Version control platforms differ immensely in their implementation of the principles discussed earlier, not least in their means of checking out a file.
In any version control platform, when a file is being checked out by a programmer the latest version is always retrieved from the repository, and the user's local version is replaced with that latest version.
However, in a platform employing Exclusive Versioning, a lock is then placed on that file with immediate effect. While the file is checked out, other developers may still retrieve the latest version of that file, but they themselves will not be able to check it out to work on it. This Exclusive Versioning is usually enforced by marking the developer's local copy of the file as read-only. Of course, this is only notional enforcement and does require the developer's cooperation to work well in practice. The lock is removed after the developer working on it checks in the file in question. The repository is then updated to reflect the latest version.
Concurrent Versioning adopts quite a different approach. The acts of retrieving the latest version of a file and checking it out to work on are combined so that they are essentially one and the same. In other words, to work on a file, all a developer must do is ensure that he or she has the latest version of that file and then start working on it. When each developer has finished making changes to the file, he or she will check it in. This is where the magic happens. If the developer in question checked out the file after another developer had checked it out, and is attempting to check it in after that other developer has checked in a changed version, the two newly submitted versions will be merged.
To try to make things a little clearer, consider the following example. This is an imaginary file called helloworld.php that prints "hello world" in the Web browser. You will see that line numbers have been included. Of course, these would not be included in the code as it is saved.
1: <?php 2: $strToPrint = "Hello World"; 3: ?> 4: <html> 5: <body> 6: <?=$strToPrint?> 7: <br /><br /> 8: </body> 9: </html>
You can call this snippet of code version 1.0 of the file.
Imagine that Jane Doe and John Doe are both working on the project. Assume that John works out of New York and Jane out of Los Angeles.
A meeting has been called with the client. They wish to change the code of this particular file so that instead of printing "Hello World," it would print "Goodbye World." John Doe's manager has given John the task of modifying the code to reflect this requirement.
In that meeting, the client also requested that a horizontal rule be drawn underneath the text where printed. Jane Doe's manager has given her the task of modifying the code to include that extra line.
In an Exclusive Versioning setup, it would be impossible for John and Jane to make their changes at the same time. John would have to check out the file, make the change, and check it in again. Jane could then check out the file, make her change, and check it in again.
In a Concurrent Versioning setup, no such requirement exists. Say that at 12:00 p.m. (Eastern Time) John does a check out to get the latest version of the file, currently version 1.0. He then starts work on making his change. At 12:01 p.m. (Eastern Time), Jane does a check out as well, also to get the latest version of the file. This is still version 1.0; John hasn't checked anything in yet. Jane starts work on her change.
At 12:05, John is done. The code works fine, so he decides to check in his work. He does so, and the repository saves his newly submitted version as the latest version version 1.1. Version 1.1. now looks like this:
1: <? 2: $strToPrint = "Goodbye World"; 3: ?> 4: <html> 5: <body> 6: <?=$strToPrint?> 7: <br /><br /> 8: </body> 9: </html>
At 12:09, Jane is done, too. The code works fine for her, so she now wants to check in her work. Her code now looks like this:
1: <? 2: $strToPrint = "Hello World"; 3: ?> 4: <html> 5: <body> 6: <?=$strToPrint?> 7: <br /><br /> 8: <hr /> 9: </body> 10: </html>
When she checks in, the repository notices that she is checking in a changed edition of version 1.0 not 1.1, which is now the latest. This recognition of which working version has been modified by the developer is usually facilitated by a tag line (a line automatically included in all source code files, usually held in comment tags to avoid upsetting compilers and interpreters). This line is created and updated by the repository and normally is not touched by the developer.
The repository now must merge the changes made by John between version 1.0 and 1.1 with the changes made by Jane between 1.0 and her new proposed version.
The repository determines that the change John made was as follows:
Change line 2 from:
2: $strToPrint = "Hello World";
2: $strToPrint = "Goodbye World";
The repository determines that the change Jane made was as follows:
Insert after line 7 a new line reading:
8: <hr />
The repository then simply takes the last version (1.0) and systematically applies both John's and Jane's changes. The merged code now looks likes this:
1: <? 2: $strToPrint = "Goodbye World"; 3: ?> 4: <html> 5: <body> 6: <?=$strToPrint?> 7: <br /><br /> 8: <hr /> 9: </body> 10: </html>
As you can see, both developers' changes have been successfully included. The repository now labels this version 1.2, and any subsequent requests for the latest version will yield this version. Neither developer is likely to be any the wiser of the merge that just took place.
There will be scenarios when two or more developers working on the same version of a file make changes that are not compatible with each other in other words, a conflict has occurred.
Suppose that a further client meeting takes place and two further requests are made: to include the time of day when saying goodbye to the world, and to include the date. The project manager assigns John the task of implementing the time-of-day requirement and Jane with implementing the date requirement.
John checks out version 1.2 and amends it to read as follows:
1: <? 2: $strTime = time("H:i:s"); 3: $strToPrint = "Goodbye World, it's $strTime"; 4: ?> 5: <html> 6: <body> 7: <?=$strToPrint?> 8: <br /><br /> 9: <hr /> 10: </body> 11: </html>
Jane also checks out version 1.2 and amends it to read as follows:
1: <? 2: $strTime = time("Y-m-d"); 3: $strToPrint = "Goodbye World, it's $strDate"; 4: ?> 5: <html> 6: <body> 7: <?=$strToPrint?> 8: <br /><br /> 9: <hr /> 10: </body> 11: </html>
The first check in (whoever gets there first) will cause a version 1.3 to be created. Can you picture what will happen when the slower of the two developers checks his or her code in? A new line has been inserted in each case and an existing line modified. Although the version control platform may well be able to combine the two new lines (by simply incorporating both), it will not know how to combine the changes made to the single line (line 3 in both cases). As PHP developers, you can see the resolution that is required it's common sense:
3: $strToPrint = "Goodbye World, it's $strTime on $strDate";
The version control platform, however, isn't that smart and so will simply throw a conflict. It is then up to the developer whose recent check-in has thrown the conflict to resolve it. In practice, the version control system will create a temporary new version, 1.4, which will actually contain details of the conflict. This temporary version will not be issued until the conflict is resolved and that version is made live. The last developer to check the file in will be notified of the conflict by the version control platform and invited to edit the temporary version to resolve the conflict him- or herself. It is the responsibility of the latter of the two developers checking in to resolve the conflict, because the version control platform will view that developer to have caused the conflict.
The temporary version 1.4 created by the repository might look something like this:
1: <? 2: $strTime = time("Y-m-d"); <<<<<<< helloworld.php 3: $strToPrint = "Goodbye World, it's $strTime"; ======= 3: $strToPrint = "Goodbye World, it's $strDate"; >>>>>>> 1.3 4: ?> 5: <html> 6: <body> 7: <?=$strToPrint?> 8: <br /><br /> 9: <hr /> 10: </body> 11: </html>
You can see the mark-up the version control platform has introduced to show the two different alternatives for line 3.
The previous example is a very simple conflict to resolve. The conflict mark-up is removed and the line in question modified to incorporate both developers' changes with the finished version and then activated in the repository.
Obviously, in more realistic examples, the resolution of conflicts can be tedious, time consuming, and costly. This is the double-edged sword of Concurrent Versioning. It is useful that two developers can concurrently work on the same file, but they must be prepared to take the responsibility to resolve conflicts when they do arise.
In practice, Concurrent Versioning may be of limited use in a well-designed PHP project. As strongly encouraged throughout this book, dividing your project into multiple components and expressing each of those components as a single file is considered best practice.
Accordingly, the requirement for two developers to work on a single file rarely crops up. If it does, it may be wise to consider whether that file is not best broken down into two smaller files, or even two smaller components.
Furthermore, consider the previous example. How likely is it that Jane and John would both be given the task of implementing these two incredibly similar requirements? One programmer could handle both requirements in this very simple example, but even in more complex examples this is almost always the case. Often a single developer will be assigned ownership of a single component and will be responsible for all changes that might be required in that component.
Finally, the principles of Concurrent Versioning seem to suggest that developers can co-participate in a development project without ever communicating. A short dialogue between two developers when a file is checked out and locked in an Exclusive Versioning environment ("Hey, Jane, are you working on helloworld.php right now? I need to make a small change'') actually aids communication. Concurrent Versioning negates that requirement to communicate, and hence developers will actually talk to each other less. This is not a good thing. It is no surprise that Concurrent Versioning is used so frequently in open source development which involves thousands of developers, many of whom have never even met but all of whom are working on the same project. Although open source development is viewed by and large to be an excellent philosophy, some of the side effects of thousands of disconnected contributors working on a single project can be seen in the incredible complexity of configuring PHP on UNIXplatforms.
The alternative is Exclusive Versioning, which prohibits two developers from working on the same file at the same time. After a file is checked out, it is physically locked from other developers until such a time as the first developer checks it back in. Accordingly, it is not possible for check-in conflicts to arise as described previously.
Of course, Exclusive Versioning has its downsides, too, even if your project is designed so that there will never be any chance of two people ever needing to work on a file at the same time. One particular bugbear that crops up time and time again is the on-vacation syndrome. That is, a developer has gone on vacation and accidentally left a file checked out, and another developer now needs to work on it. Sure, you could go into that developer's workstation and check it in, but what if it's a work in progress that isn't ready? If you undo check-out, you risk losing changes that, for all you know, may be 90 percent complete! This is, naturally, a procedural issue as much as anything else, and suitable policies for your development team can avoid this scenario.
The choice is very much up to you. A well-designed PHP5 project with maybe four or five developers all working in the same office is an obvious candidate for Exclusive Versioning of some ilk. A legacy PHP3 project with thousands of developers from all over the world is an obvious candidate for Concurrent Versioning.
Very shortly you'll encounter a few software packages that provide a version control platform. Before that, it's worth understanding a bit better how a version control topology works in a real-world setup.
A topology like this will normally apply regardless of the software you decide to use, and irrespective of whether you opt for Concurrent or Exclusive Versioning for your project. Consider Figure A-1, in which John, Jane, and David are all developers working on the same project project foo. They all have access to a workstation of their own, which is for their use only.
Obviously, they do not run PHP on each of their workstations, so they all share a powerful central development server. Each developer has his or her own instance of a virtual server representing that developer's own copy of project foo, based on source code stored in his or her own home directory on the development server.
For example, John uses http://john.projectfoo.example.com, which points to source held in his own home directory on the development server, in /home/jon/public_html/projectfoo. Similar setups exist for each developer working on the project.
There is also a staging server that is used by the lead architect as a base for testing and examining the latest version of the project, and possibly for internal demonstrations. An external staging server likely is maintained off-site for external client demonstrations.
When John, Jane, or David wants to work on the code for the project, they work on the copies of code in their own home directories, to which nobody else has access. These home directories reside on the server, so they use Samba (see http://www.samba.org/ for more information) to map a network drive on their workstation to the home directory on the server. They then edit files in their own copy of the project directly. This is their working area. No project files are ever stored on their own workstation, but because the files are exposed through the network drive, developers edit them on the workstation as though they were local.
However, John, Jane, and David must use version control software on this project. This means that whenever they wish to work on a file, they must perform a check-out action using their version control software running on their workstation (for the sake of argument, say it's Microsoft Visual SourceSafe, which we describe shortly). This downloads the latest version of the file in question from the repository server to the home directory on the development server. This copy of the file is set to writable so that they may freely work on it. If John, Jane, or David forgets to check out a file before opening it in their IDE of choice, they will find that it is set to read-only and they are unable to save their changes. While the file is checked out, nobody else may check it out to work on it.
When they have finished working on the file, they simply check in the file, which records the latest version in the repository. Others who check out the file in the future will then be presented with the latest version incorporating the changes just made. The process is similar should one of them wish to add a new file to the repository a quick push of the Add File button and the file is incorporated permanently.
Because John, Jane, and David are working on the project simultaneously, they need to periodically perform a "get latest versions'' operation. This means that the latest versions of all files from the repository will be copied to their local copy in their home directory on the development server, even if they have no intention of ever checking those files out. This is an important practice for two reasons. It be necessary to have some extra functionality that somebody has recently added to the project to make some other component you wish to work on function correctly. Also, this practice allows developers to quickly see what their colleagues have been working on and, if necessary, point out any errors or provide constructive criticism.
Now and again, Paul, lead architect, may choose to perform a "get latest versions'' into a directory (not his own) on the server marked as staging. Doing so allows him to see a snapshot of the project as it exists in the repository so that he, too, may provide constructive criticism of his team's work.
If one is using another version control package, minor variations in this topology and process may exist. For example, with CVS, the developers would be unlikely to use any client software on their workstation. Rather, they would simply create some kind of terminal connection (such as an SSH connection) to the development server and run the CVS client directly on the server.
It is worth pointing out that the exact role of the repository server varies, too. CVS supports a genuine client-server protocol for the exchange of data, called pserver. Visual SourceSafe, however, simply uses a shared data volume on a network drive. As a result, a separate physical repository server may not be necessary, and the development server could quite easily double up in a repository role.