Friday, April 16, 2010

playing around equals() and hashCode() methods of Object class

Why, when & how of overriding equals() and hashCode() methods of Object class

As we know the Object class has five non final methods namely equals, hashCode, toString, clone, and finalize.
I believe they were primarily designed to be overridden according to specific needs.
I am just trying to summarize my understanding around these two methods. When should we override these and how we should implement these etc. Any comments/discussion/feedback would be highly appreciated on above topics.

Overriding the equals method -
Ideally you should override the equals method when you want to specify the rules of logical equality of objects. Two objects are logically equal if they have the same values for the same uniqueness attributes.

We need to implement an equivalence relation between non null object references. The rules to override equals method can be found on sun's site. They basically talk about Symmetry , Reflectivity Consistency & Transitivity among the objects.

The rule for the null reference is that any non-null reference value x, x.equals(null) must return false.

Object class already provides an implementation of the equals method. It is
public boolean equals(Object obj) {
return (this == obj);
}

The method above simply tests for equality of object references. This is not always the desired behavior, particularly when comparing Strings. That's why the String class provides its own implementation of the equals method.

For example suppose we have a Student class which has a constructor like this.
public Student(String name, String id, int age) { …}

Now if you create two student objects with the same attributes, you'd want those two objects to be the same student. If you write a code like below then you will get false as result.

Student student1 = new Student("sajid", "456789", 28);
Student student2 = new Student("sajid", "456789", 28);
System.out.println(student1.equals(student2));

Result is false because student1 and student2 are different references and the equals method being used here (from the object class) is comparing references. So here we need to override the equals method to check for our uniqueness attributes for our comparisons to work.

Caution: we should not do a mistake of overloading the equals method instead of overriding it. So the argument to the equals method must be an Object.

See the difference between below two statements. You will understand it.
public boolean equals(Object obj)
public boolean equals(Student student)

A good implementation of equals should start with symmetry test. Means x.equals(x) should always return true. If this it self is failing then no need to proceed further.

Another important point would be to test the instance of the object passed to equals() method. People might pass a String object instead of Object in the argument. Here instanceof operator helps us to check this.

instanceof also eliminates the trivial cases when the object passed is null because it returns false if the first argument is null.

So whole implementation might look like this …

public boolean equals(Object obj) {
if (this == obj) {
return true;
}

if ( !(obj instanceof Student)) {
return false;
}

Student student = (Student)obj;
return age == student.getAge() && name.equals(student.getName())
&& id.equals(student.getId());

}

Now System.out.println(student1.equals(student2)); will print true.

Food for thought -
Note that we deliberately compared the ages (integers) first. The && operator has short circuit behavior, meaning that if the age comparison fails the rest of the comparison is abandoned and false is returned. It is therefore a performance advantage to have the cheaper (memory wise) tests first and the more memory demanding tests last.

Is there any situation when we should not override equals() ?
Yes. When the references check is sufficient. This is when each instance of the class is unique. Other situation could be when parent class already has implemented the desired behavior then we need not bother.

Now whenever we override the equals method, we must also override the hashCode method.
So let's move ahead to hashing …


Overriding the hashCode method -
Why hashcode ? Simple the hashCode method is supported for the benefit of hash based collections. Basically this hash code value is used by hash based collections such as Hashtable, HashMap, HashSet, etc. for storing, retrieving, sorting, and other data structure operations.

The contract says that If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result. It also says that the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.

It further says that It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results.

So equal objects must have equal hashCodes. An easy way to ensure that this condition is always satisfied is to use the same attributes used in determining equality in determining the hashCode. Now we should realize why it is important to override hashCode every time we override equals.

The story of hashtable and buckets .. (Some useful history) -
we can think a hash table as a group of buckets. When you add a key-value pair, the key's hashCode is used to determine which bucket to put the mapping.
Similarly when you call the get method with a key on the hash table, the key's hashCode is used to determine in which bucket the mapping was stored. This is the bucket that is searched (sequentially) for the mapping.
If you have two "equal" objects but with different hashCodes, then the hash table will see them as different objects and put them in different buckets. Similarly you can only retrieve an object from a hash table by passing an object with the same hashCode as the object you are trying to retrieve. If no matching hashCode is found, null is returned.
So let's say it again, "Equal objects must have equal hashCodes".

The best hashcode approach -
We should try to make all unequal objects have unequal hashCodes. This means each mapping is stored in its own bucket. This is the optimal case for the hash table and results in linear search times because only the correct bucket needs to be searched for. Once the correct bucket is found, the search is complete. That's why the API docs said

"However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables."

How do I implement hashCode ?
Well we want it to be linked to the equals method in some way and so it must use the same attributes as the equals method.

The hashCode method signature in Object class is -
public native int hashCode();

The key thing to note here is that the method returns an integer. This means that we should try to get an integer representation of all the attributes that were used to determine equality in the equals method. The trick is that we should get this integer representation in a way that ensures that we always get the same int value for the same attribute value.
Once we have the integers, it's up to us to find a way of combining them into one integer that represents the hashCode for our object.

Whatever algorithm we use, we must make sure that the result is always an integer and will be the same integer returned for equal objects.
So how do we determine the hashCodes for the attributes themselves?

For the individual attributes values, you can use the following popular approach
(Source: http://bytes.com/topic/java/insights/723476-overriding-equals-hashcode-methods )
  • For boolean variables use 0 if it's true and 1 if it's false.
  • Converting byte, char or short to int is easy. Just cast to int. The result is always the same for the same value of the attribute.
  • A long is bigger than an int. You can use (int)value^(value >>> 32) . This is the method used by the java.lang.Long class.
  • If the field is a float, use Float.floatToIntBits(value).
  • If the field is a double, use Double.doubleToLongBits(value), and then hash the resulting long using the method above for long type.
  • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, then recursively invoke hashCode on the field as well.
  • If the value of the field is null, return 0 (or some other constant, 0 is more common but you might want to distinguish it from the boolean case).
  • Finally, if the field is an array, go through each element and compute each element's hashCode value. Use the sum of the hashCodes as the hashCode for the array attribute.

Let's look at some ways of doing it.
A common approach is to choose a multiplier, say p, and then compute an int value by applying the following formula
hashCode = multiplier * hashCode + attribute's hashCode for all the attributes.

For three atributes (a1, a2, a3), the hashCode would be computed in the following steps
hashCode = multiplier * hashCode + a1's hashCode //step 1
hashCode = multiplier * hashCode + a2's hashCode //step 2
hashCode = multiplier * hashCode + a3's hashCode //step 3

Now putting it all together our Student class will look like this. .

public class Student {
String id;
String name;
int age;
private volatile int hashCode = 0;

public Student(String name , String id, int age) {
this.name = name;
this.id = id;
this.age = age;
}

String getName() {
return name;
}

int getAge() {
return age;
}

String getId() {
return id;
}

public boolean equals(Object obj) {
if(this == obj) {
return true;
}
if (!(obj instanceof Student)) {
return false;
}
Student student = (Student)obj;
return age == student.getAge() && name.equals(student.getName())
&& id.equals(student.getId());

}

public int hashCode () {
final int multiplier = 23; // we should use any prime number here like 23 or 31 etc
if (hashCode == 0) {
int code = 133;
code = multiplier * code + age;
code = multiplier * code + id.hashCode();
code = multiplier * code + name.hashCode();
hashCode = code;
}
return hashCode;
}

}

To test we can use a main() method in the above class like this -

public static void main(String args[]){
Student student1 = new Student("sajid", "456789", 28);
Student student2 = new Student("sajid", "456789", 28);
System.out.println(student1.equals(student2)); // prints true
}

Another way to implement hashCode would be
Similar to how we implemented hashCode our hash code calculation must involve all attributes of the class that contribute to the equality comparison of the class. For Car class example, the following would work:
public int hashCode()
{
int hash = 7;
hash = 31 * hash +
(null == this.licensePlate ? 0 : this.licensePlate.hashCode());
hash = 31 * hash +
(null == this.vinNumber ? 0 : this.vinNumber.hashCode());

return hash;
}

Java provides an implementation of hashCode for built-in classes such as String and other wrapper classes. If you have an int data type that is part of your equals comparison you can simply add, subtract, multiply, or divide it into your hash code value.
Several APIs demand that the user must implement the hashCode() method. The reason is that these APIs are using hash based containers (like HashMap) to have a fast means of managing lots of objects (always comparing objects using equals() would need endless time).

I will try to further dig into the internals of hashing and difference between different hash based collections and the best performance strategies around it.

No comments:

Popular Posts