Monday, December 19, 2011

Git advantages in the corporate environment

I've come across a couple of  blogs and message board posts (some of them admittedly dated) lately which insist git is not usable in a corporate environment because:


• git is not centralized.

Corporate environments MUST HAVE a centralized solution for backup purposes

• git does not have canonical revision numbers.

Corporate environments must have canonical version numbers.

 

The Backup Issue


The first point is of course, invalid.  Git can be used in a centralized manner, making all the backup monkeys happy.  Used in this manner, it's not different from subversion in that the repository exists on a server and it can be backed up.


However, that ignores a far bigger point. What if your IT department isn't quite as awesome as they think they are?  What if you were told about the crack sysadmin team, only to learn the system admin is low-rate overseas hourly worker?  What if he didn't quite get that backup job done before the bus came?  What if he hasn't checked if the script he wrote 4 months ago is still working?


One night your server stops responding, and after opening a case with IT, you learn to your horror that they're unable to restore the backup.


Not to worry, right?  Every developer on the team has a copy of your subversion repository, right?  Wrong.  They have a single checkout of repository.  One version.  Need to revert to a previous version? Can't do it.  Need to look at the changes involved in a particular bug fix?  Not happening.  Need to review the commit messages for a particular commit? Sorry, the entire log is gone.


This differs markedly from the git world.  In the git world, should your centralized repository be lost, every single developer has a copy of the entire repository.  It is possible that if a developer hasn't updated between the time another developer pushed some commits and the central repository was lost, he might be a couple commits behind.  This is true in the subversion world too.  If developers hadn't updated before the repository was lost, only the last person who checked in will have the last few commits.


It can't be stressed enough how much better your situation is with git.  Think about it.  With subversion, you'd figure out who had the up-to-date copy of the repository.  Then what?  You can't find out what revision your teammate had and send him a patch to bring his copy of the project up to date, since subversion can't generate a diff without access to the lost central repository.  I suppose the most up-to-date developer could put his copy on a file share somewhere and developers could manually diff the files and apply any changes.  Yeah, that's it, what fun.


The other choice of course is just to take the most up-to-date developer's project and use it to create a new subversion repository.  That will work and the team will be able to check out an up-to-date copy of the project.  An up-to-date copy with no history, no previous versions, nothing. Certainly not what one would consider "enterprise-level source code management."

Oh, but it also gets better.  What if your team's discipline is really lacking, and nobody has a version checked out that actually works?  You're hosed - you can't revert to a known working version - you have no central repository anymore.


Now let's look at the git situation.  Like with subversion, the team is likely to figure out who has the most recent copy of the repository.  With one command he can generate patch files that his colleagues can run to bring their repositories up-to-date before any kind of centralized server is restored.  Depending on the size of the team, team members could be completely up-to-date, sharing changes again and productive again literally within a few minutes.


Oh and those patch files?  Not only do they update the code to the correct state, but they also include the log messages, commit hash, etc.  So when the patches are applied, the resulting repository looks EXACTLY like the repository from which it came.


The Revision Number Issue


I don't know who came up with this gem, but talk about a ridiculous argument. Someone tell me what a canonical version number is?  Is it sequential?  Is it unique? Is it alphanumeric?  Is it octal?


Although subversion has simple version numbers, they're not particuarly useful.  Sure, I can tell my team member - "Hey, create that branch off revision 2142 of trunk," but that doesn't do him any good other than pointing to a version.


And while you would think you could figure out relationships between revisions using the revision number, it's actually difficult to do so, since revision numbers are global and increment when other developers check in, or even when some checks in on another branch.  So, think that revision 10 on trunk is the result of 5 commits on trunk since you checked in revision 5?  Could be, but it also could be that someone checked in 4 revisions to a branch, so that revision 10 actually follows revision 5 in trunk.  Not much value there.

Now, git has those super-complicated hex-looking numbers.  I mean, really.  We as developers are supposed to be able to deal with these hex strings?  How ridiculous is that?First of all, let's get the easy part out of the way.  Those big hex numbers are not difficult to deal with because we can shorten them dramatically.

In small repositories, four characters is sufficient.  In large repositories, 4, 5, or 6 characters is certainly going to be sufficient.  No more difficult than typing version "21042."  And of course we can use all sorts of symbolic names and expressions such as HEAD, HEAD^ (the parent of HEAD), time-based specifiers and others.

More importantly however, the commit is a hash of the branch at that particular time.  What this means is that you can GUARANTEE that code with the same hash is exactly the same.

Here's how cool this is:

Let's say for a minute that the repository that went down was a public project that my company exposes to the outside world and allows others to clone.  And now let's say that somehow, some way, I lost the last 5 commits out of that repository.  I have a copy of the log, so I know what the commits were, but I already checked with everyone in my company and nobody has a copy of those commits.Oh man, I'm in big trouble.  I don't have a trustworthy source.

Can I really just reach out to the some random 12-year-old on the Internet that has a copy of the repository and ask for a patch containing those commits?  Maybe I should find someone that works for another large company.  Surely that's a good way to make sure they're trustworthy.  Or maybe I can get my company to pay for a background check on someone.

The fact of the matter is, if you have the hash, you need not worry.  The code could come from the most notorious hacker group on the planet and as long as the hashes match, I know they did not change one single bit in my code.

Conclusion

Wow.  That's a lot more than I intended to write.  But hopefully you can see now why such arguments are not only ridiculous, but just plain wrong.

For the Mercurial fans out there, readers should note that I'm willing to bet Mercurial has the same advantages as git in these areas.  Both are excellent DVCS systems and I wouldn't hesitate to use either.

Finally, I do recognize that there are situations for which a DVCS may not be appropriate.  If you have terabytes of data in your repository, git or hg is probably not for you without a lot of repository re-organization.  In those cases, by all means stick with your centralized VCS.  Just realize that because you have something preventing you from using a DVCS doesn't mean the DVCS advantages don't still exist.

Monday, October 24, 2011

SQLite2 source and errors

Every once in a while, I come across a SQLITE2 database that needs to be converted to SQLITE3 format.

Doing so is easy. You just dump out of SQLITE2, and import into SQLITE3, such as:

sqlite2 mysqlite2db.db .dump > sqlite2_mysqlite2db.dump

sqlite3 mysqlite3db.db < sqlite2_mysqlite2db.dump



One problem is though, SQLITE2 is often not available anymore through any of the standard Unix package management systems. So if you have a newer machine with an app that's utilizing a SQLITE2 database and you want to upgrade that app to a newer version that uses SQLITE3, it's tough to get SQLITE2 installed so you can do the conversion.

What I did is download the SQLITE2 source code from:

http://www.sqlite.org/sqlite-source-2_8_17.zip


Then I compiled. (Remove the tclsqlite.c file. It's not needed)

This is on Debian Linux, BTW.

gcc -o sqlite2 *.c


That worked and got me a sqlite2 executable. However, when I tried to use it on my SQLITE2 database, I got:

sqlite2: btree.c:702: sqliteBtreeOpen: Assertion `sizeof(ptr)==sizeof(char*)' failed.

Hmm...what the heck?

I didn't carefully read the error and instead quickly went to googling. However, I didn't find much in the way of help. However, it quickly struck me that this was a 64-bit problem. Aha! I had forgotten to compile the program with the 32-bit flag. No problem.

(On OSX the flag would be -arch i386)

gcc -m32 -o sqlite2 *.c


But that gave me:

/usr/include/gnu/stubs.h:7:27: error: gnu/stubs-32.h: No such file or directory


After a little googling on that, I discovered I needed to install the lib6c-dev-i386 package, since I was running on an AMD 64-bit system. I installed the package, recompiled and everything is working fine.

Wednesday, September 7, 2011

Installing Ruby 1.8.7p352 under rvm

I was recently trying to install Ruby under RVM on OS X, 10.6.8, running in 32-bit mode.

For Ruby 1.9.2 p290, it was no problem. rvm installed it without issue (using x86_64).

However, Ruby 1.8.7 p352 would fail constantly, with:

making ruby
/usr/bin/gcc-4.2 -arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp -fno-common -pipe -fno-common -DRUBY_EXPORT -L. -arch i386 -arch x86_64 -bind_at_load main.o -lruby -ldl -lobjc -o ruby
ld: warning: ignoring file ./libruby.dylib, file was built for unsupported file format which is not the architecture being linked (i386)
Undefined symbols for architecture i386:
"_ruby_init_stack", referenced from:
_main in main.o
"_ruby_init", referenced from:
_main in main.o
"_ruby_options", referenced from:
_main in main.o
"_ruby_run", referenced from:
_main in main.o
ld: symbol(s) not found for architecture i386
collect2: ld returned 1 exit status
lipo: can't open input file: /var/folders/rp/rprVpkCiGbKvS3sirOXX9k+++TI/-Tmp-//ccoGzQBh.out (No such file or directory)
make[1]: *** [ruby] Error 1
make: *** [all] Error 2


You can see above that files are getting correctly compiled with both the -i386 and -x86_64 -arch flags. That was confusing. I verified that the Makefile included the arch flags on both the CFLAGS and LDFLAGS variable.

I traced the problem to the following line of output:

cc -dynamiclib -undefined suppress -flat_namespace -install_name
/Users/wwilliam/.rvm/rubies/ruby-1.8.7-p352/lib/libruby.dylib
-current_version 1.8.7 -compatibility_version 1.8 array.o bignum.o
class.o compar.o dir.o dln.o enum.o enumerator.o error.o eval.o file.o
gc.o hash.o inits.o io.o marshal.o math.o numeric.o object.o pack.o
parse.o process.o prec.o random.o range.o re.o regex.o ruby.o signal.o
sprintf.o st.o string.o struct.o time.o util.o variable.o version.o
dmyext.o -o libruby.1.8.7.dylib


As you can see, the dynamic library is being linked here without any architecture flags. What needed to be set is the LDSHARED variable.

My first attempt was to modify the Makefile by hand, adding the -arch flags to LDSHARED. That worked - somewhat. I was able to run make and get everything compiled. However, that didn't solve the problem completely, because whenever I tried to install via rvm again, my changes were overwritten since rvm ran configure again.

Naturally, for my next attempt, I did a "export LDSHARED="-arch i386 -arch x86_64" in the terminal in which I was running rvm install. That also did not work. I checked and config.log DID show my additions to LDSHARED. Hmmm...WTH?

It turns out that the configure.in defines LDSHARED explicity for each machine type and does not use the value that configure picks up.

For darwin, it was:

        darwin*)        : ${LDSHARED='cc -dynamic -bundle -undefined suppress -flat_namespace'}


So the fix in this case, was to change it to:
        darwin*)        : ${LDSHARED='cc -arch i386 -arch x86_64 -dynamic -bundle -undefined suppress -flat_namespace'}


I did submit bug #5295 against Ruby 1.8 for this issue. Whether it will get fixed or not, I don't know.

Saturday, June 4, 2011

Update on VATSIM work

I realize I haven't posted on this blog in many months, but I thought I'd do a short update just to talk about the things I've been working on.

Most of my work recently has gone into the VATSIM FSD. None of the changes have been earth-shattering, but they have involved things like updating the FSD to work with sqlite3, changing its build system to Cmake and improving its method of locating its configuration files. None of those changes will be noticed by VATSIM users, but they make life easier for those of us who work on the code and support the servers.

For those of you waiting with bated breath on the upper-layer wind improvements I've been working on, the bad news is it's still not finished. The good news is I've started working on it again. I'm currently in the process of changing the backend database from MySQL to Postgres. Once that's complete, I'll be able to go back to finishing up the wind management code.