Update:
As it turns out, I was (mostly) being a horse’s ass. It was a bit difficult to find, but Java does provide a mechanism for getting the equivalent of Object.hashCode() on any given object (not in a general way, i.e. get the un-overridden behavior, but for this case specifically). That method is System.identityHashCode(). Java also provides a hash data structure that uses == instead of .equals(), IdentityHashMap. I’ll leave the rest of the article around, though, just so I can use it as a reminder of how silly I look with my foot in my mouth.
One of the side-effects of the way Java is designed is that it is hard to identify an object. By this, I don’t mean identifying it by its apparent value, as in a List is the same as another list with the same stuff in the same order. What I mean is identifying an object as this object, not by its value, but by its physical existence in the whole program world thingamabobber. This usually isn’t too big of a problem, until you really start working with the collection framework’s HashMap.
First, here is a description of what was happening, as it pertains to my independent study:
Within the observable collection framework, listeners that want to receive events on a given object call a static method on a centralized “server”. This server essentially matches up observable objects with listeners (from the previous post in this series, this is an attempt to get rid of the copy/paste coding that Java forces on the programmer). Internally, the relationships are mapped out in a Map that takes observable objects and links them to another
Map, which links event types to a list of listeners (this allows me to easily get a List of listeners when an event happens of a given type on a given observable object). In an example application (my honors project), I could see classes registering for events on a given ObservableList. As each event called, I could see the list of listeners grow. Then, when I tested the application, I attempted to add something to the list, and all of the listeners vanished! Also, the Map’s size was still 1, and the element itself had actually been changed. However, no events were delivered, since the key’s value was now null! What the hell did I do wrong?
And now, the (long-winded) explanation:
The way HashMap works is to take an array of objects (keys), apply some algorithm to them (hashing), and access into another array of buckets (values). This is all simple enough, I suppose. Java’s HashMap also does “the obvious”, which is that it uses the hashCode() function (original implemented in Object) to do this hashing algorithm, plus another layer of hashing (protecting against poorly designed hash functions, as the file’s comment says). In most cases, this is no real issue, because the only forseen problem is collisions, which are handled with both obj1 == obj2 and, if the objects are not null, obj1.equals(obj2).
A quick side-note here: the authors of Java seem to send some mixed messages here. You can compare two objects by their locations by using obj1 == obj2. This can be handy when you want to determine if two objects are really the same, but the fact that they don’t give you any other way of identifying the objects (besides as comparison), it’s a mixed signal. Either they should give you a method on Object, say, getAbsoluteIdentifier() (that returns an int/long), or they should just make == call Object.equals(Object other). I would be happier, in this case, with the former, but the latter would be a much better choice. Anywhoo, back to the story.
So the problem comes when an object’s hash changes during the execution of the program. Originally, I assumed this wasn’t the problem, since I had assumed that this would make the map contain two values (as the hash and equals methods would now give different results when compared from the original object and the changed object). The problem is that HashMap uses == to determine identity in certain places. In essence, this makes things faster, since it tests for real equality (or whatever you call it), but it put the map into a very strange state. The keys were not duplicated, but the hash code of the key now was different, so it pointed to a different bucket, and thus I had lost the list of event listeners.
The reason it gets so complicated is that the identity of the key is not determined by the hash when comparing keys, but it is determined by the hash when relating it to the values. This is due to the fact that two unequal objects may return the same hash code, and we must deal with collisions. But this leaves us with the seemingly unsolvable problem of objects that change hash during execution.
I figured that maybe it was just a fluke that the ObservableList‘s hash code was changing during execution. Object‘s hashCode() method uses the internal address, thus giving you the real identity of the object. Problem is that AbstractList, way down the inheritance tree, was overriding hashCode() to something that changed during execution.
Now, to be fair, it appears that the Java “way to do things” actually encourages this. Here’s the Javadoc text for hashCode():
int java.lang.Object.hashCode()
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by
java.util.Hashtable.The general contract of
hashCodeis:
- Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
- If two objects are equal according to the equals(Object) method, then calling the
hashCodemethod on each of the two objects must produce the same integer result.- It is not required that if two objects are unequal according to the java.lang.Object.equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
Note the part around “hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.” So, we are left with yet another inconsistency. Object’s hashCode relates only to its == method (in that it uses internal representation), but the authors feel it should be related to the equals() method. In a sense, they are saying that an object’s identity is really how it relates to other objects by equals() equality, not == equality.
This is absolutely fine in a world where everything is immutable. Unfortunately, we have no such luxury with things like, well, Collection classes in Java. The short story is this: you cannot use a Collection as a key in a HashTable.
Noah never takes his own advice
You see, the problem is that I really want a quick way of associating a collection with some other data. Also, I really, I mean really disagree with the whole “equals defines you” idea in a world where the majority of used objects are mutable. The problem is that Java has left me no outs. There, honestly, is nowhere I can go from here.
Fix idea #1: call Object‘s hashCode directly
This seems like the first smart thing to do. Since Object‘s hashCode will treat this situation the way I want it to be (“It’s the same ObservableList! I added something to it, but it is still the same collection!”), I want to just revert the behavior back from the override of hashCode() in AbstractList. So what’s the problem?
Java doesn’t let you explicitly call methods on parent classes.
“But no!” you say, “Java let’s you use super.methodname()!” Why yes, yes it does. And that is great if you just want to refer to the next guy up the line. The problem is that I think that guy is a bit retarded. See, I think his daddy knows what’s going on. I just want to talk to the big guy at the top of the rope. Seriously, please?
Java says no. All you get is the ability to call the next guy up the chain, and nothing else. I suppose you could argue this as being a good thing, as it follows the general design idea that, “once it is written, you can’t muck it.” This isn’t really too much of a problem in most cases, since you can just override the method again and give it the correct (or better) behavior. So that’s what I decided to do:
Fix idea #2: write hashCode to get the object’s internal address
Hahahahahaha. Hahahahaha. See that? That’s me laughing at myself. You can’t get an object’s internal address, dummy. Even the original hashCode method on Object has to use native code to do this (see the signature for yourself – it is listed as native). Again, this goes back to the whole idea that Java should give me some way of identifying an object as being this object. Too bad it doesn’t.
Fix idea #3: give up, write a stupid incremental hash code for the thing, write a blog article about it, and hope that somebody has a better idea
Um, that means you guys. That’s right. I need a better idea.
Immutable classes?
Some people have argued that you really should have all objects be immutable, that every method should return a new copy of the thing with the change you wanted. You’ll notice that a good deal of Java objects act like this, such as all of the Integer/Float/etc. classes, and String, too. These are not modifiable, and all of the “modifiers” return a new object with the different value. This seems to be a decent idea, but it runs into a few problems.
First, we have problems making these things efficient. There are whole areas of study devoted to making the art of copy-on-write (and its sisters) into more easily digestible topics. The idea is that every class is shallowly copied by default, but if something ever changes, then the real deep-copy occurs. This grows into things like collections that are just trees of copy-on-write branches, so when you make changes to the collection, only the branch that changes needs to actually be copied. The rest all just point into the same place.
Unfortunately, even if these solutions became as efficient as just regular mutability, they are, by nature, more complex. Just picture what it would take to efficiently implement what I just described for, say, a hash map. Every time you mutate a hash map, you make a copy and change as little as possible. Compare that to, say, just changing the thing in place. Much easier, I’d say. Much dirtier, in a sense, but still much easier.
Secondly, we have the ideas of using things as keys. In Java, the hash map has a set of references to real things (for the most part) to use as keys. For the simply immutable types, such as Strings, there is really no issue, as a String’s identity seems to be agreeable to be the thing it represents (in other words, String really seems like a value type, in that its definition is its value). However, say we want to use a collection, something that changes, and generally changes often (no, a String is not a collection of characters in this world). Since every change gets us a new collection, now our problem isn’t a strange implementation, it is a category mistake. You cannot have collections as keys in maps because it just doesn’t make sense. Now, maybe it doesn’t make sense in a sane world, but in Java it should. In this new, immutable world, you really can’t use anything that will “change” (give you a new one) as a key to a hash map.
Now, there are ways around this two. You can create some identifier that “contains” the collection, much as the collection contains a tree of immutable stuff, and then the identifier would never change (just its internal, copy-on-write representation would change). You could then use this identifier as the key, and you would be off to the races. The problem is just that, one, it is yet another layer of abstraction, and, two, you end up awfully close to a mutable world.
That all being said, the immutable world is the perfect world. I mean that. Until we have mainstream programming languages without side-effects, you will never be able to (EASILY) make your PS3 take advantage of those gazillion processors. The problem is, and will continue to be, that side-effects kill any simple attempt at multiprocessing. Whoops! That’s right, all you PS3 suckers…*COUGH*…buyers out there. If the gaming companies wrote in Haskell, your games would take advantage of all 8 cores with ease.
This seems like it would make a good topic for another post. Maybe I’ll call it, “Why functional languages make programming easier.” Here’s the skinny, for all the Lisp haters out there: typing parentheses isn’t hard. Figuring out a simple, effective, efficient, scalable, configurable, and cross-platform(able?) multiprocessing model is hard. Maybe a few other names could be:
“(> Lisp you)”
“Java sucks even more (and so does every other language)”
“The Bible should have been written in a functional language”
“The PS3 is about as awesome as a woman with 8 vaginas – who cares if you only have 1 dick (C++)?”