11 July 2023

(Hopefully) the final article about equals and hashCode for JPA entities with DB-generated IDs

TL;DR

In this article, we’ll explore the proper implementation of the equals() and hashCode() methods for JPA entities. While you can find a lot of implementations on the internet, it's crucial to understand the reasoning behind the chosen implementations to avoid potential issues. By reading the entire article, you will:

  1. Gain insights about default equals() and hashCode() implementations;
  2. Discover issues you might encounter using common equals() and hashCode() implementations found on the internet;
  3. Learn a lot of interesting things about proxies in Hibernate.

However, if you're short on time, you can skip to the end, copy the proper equals() and hashCode() methods, and paste them into your code :).

Should we override equals and hashCode at all?

Let's begin with a simple example. We have an online school application. Now, we need to create a REST endpoint that retrieves all students whose names start with specific letters and have an average grade lower than a certain value. Luckily, we already have a service with methods that can help us complete this task.


@Getter 
@Setter 
@Entity 
@Table(name = "student") 
public class Student { 
  @Id 
  @GeneratedValue(strategy = GenerationType.IDENTITY) 
  @Column(name = "id", nullable = false) 
  private Long id; 
 
  @Column(name = "name") 
  private String name; 
 
  @Column(name = "average_grade") 
  private Short averageGrade; 
} 


public interface StudentRepository extends JpaRepository<Student, Long> { 
  Set<Student> findByAverageGradeLessThan(Short averageGrade); 
  Set<Student> findByNameStartsWithIgnoreCase(String name); 
} 


@Service 
public class StudentService { 
 
  private final StudentRepository studentRepository; 
 
  public StudentService(StudentRepository studentRepository) { 
    this.studentRepository = studentRepository; 
  } 
 
  @Transactional 
  public Set<Student> getStudentsWithNameStartsWith(String letter) { 
    return studentRepository.findByNameStartsWithIgnoreCase(letter); 
  } 
 
  @Transactional 
  public Set<Student> getStudentsWithAverageGradeLessThan(Short averageGrade) { 
    return studentRepository.findByAverageGradeLessThan(averageGrade); 
  } 
} 

So, the controller with the corresponding REST endpoint will look like this:


@RestController 
@RequestMapping("/students") 
public class StudentController { 
  private final StudentService studentService; 
 
  public StudentController(StudentService studentService) { 
    this.studentService = studentService; 
  } 
 
  @GetMapping(value = "/name-and-grade") 
  public Set<Student> getStudentsWithNameStartsWithAndGradeLessThan(
             @RequestParam(name = "prefix") String prefix,
             @RequestParam(name = "grade") Short grade) { 
    Set<Student> students = studentService.getStudentsWithNameStartsWith(prefix); 
    students.retainAll(studentService.getStudentsWithAverageGradeLessThan(grade)); 
    return students; 
  } 
} 

Please note that we have disabled the "open session in view". Firstly, long database sessions have a negative impact on throughput, thereby affecting the performance of the application. Secondly, we do not have control over additional queries, which can lead to the N+1 query problem. Lastly, all additional queries open transactions in auto-commit mode, causing transaction logs to be flushed with each statement, negatively impacting database performance once again. Therefore, we do not have a transaction open all the time, so we have to open it ourselves when needed.

spring.jpa.open-in-view=false 

Our controller will not work correctly. Even if we have students whose names start with certain letters and their average grade is below a specific value, our controller will still return an empty Set. In reality, it will always return an empty Set.

The reason lies in the default implementations of the equals() and hashCode() methods. By default, the equals() method checks for reference equality – whether the two objects are the same instance in memory. In our case, since the entities were fetched from different methods and separate transactions, they represent different objects in memory.

We can fix it by adding the @Transactional annotation above the endpoint. In this case, the entities will be fetched in a single transaction and stored in one location in memory. However, using the @Transactional annotation in REST controllers is not recommended due to potential issues such as resource locking, inconsistent transaction boundaries, violation of separation of concerns, complex rollback scenarios, and so on. It is better to delegate transaction management to a service layer.

Another way to fix it is to create a new method in our service, annotate it with @Transactional, and move all the logic there. Yes, that would be a good solution. But what if you don’t want to open a shared transaction for the two methods? Then, you should override the equals() and hashCode() methods for the Student entity. In the overridden methods, you must ensure that the entities represent the same record in the database rather than the same location in memory.

In summary, when handling entities in the application's business logic, it is important to override equals() and hashCode(). If you don't do this, objects that are related to the same row in a database table might be considered unequal.

However, if you use entities only to fetch data from the database within a single transaction, there's no need to override these methods. In such cases, it is essential to adhere to the following conventions:

  1. Avoid using entities within hash-based collections throughout the entire project;
  2. If you use hash-based collections, they should exist within the transaction where you retrieved the entities;
  3. Do not mix entities from different transactions into the same hash-based collection;
  4. Do not compare entities via equals().

And if you have decided to override equals() and hashCode() in your entities, you’ll need to search for the best version on the internet.

Let me Google/ChatGPT it for you

When we search for "equals and hashCode for JPA entities", one of the top links leads to a Stack Overflow question titled "The JPA hashCode() / equals() dilemma.” In addition to that, we come across articles authored by Thorben Janssen, Vlad Mihalcea, and other experts. Finally, as developers in 2023, we can seek guidance from ChatGPT. Interestingly, all these sources provide almost identical implementations.

To make it easier to understand, I have unified the implementations into a common format while preserving the underlying logic.

Stack Overflow and Vlad Mihalcea implementations

The equals() methods in the provided implementations are identical. They use instanceof to test whether the object is an instance of the specified type (class, subclass, or interface). However, transferring such an implementation to the base MappedSuperclass would lead to an incorrect child entities comparison.

For example, let's consider the Cat and Dog classes, which extend the Animal class. The Animal class contains both methods, and none of the child classes override them. The current implementation will return true when comparing objects of the Cat and Dog classes with the same ID. This defies logic and is not desirable behavior.

The implementations of the hashCode() method are slightly different. Vlad Mihalcea's approach is more accurate than the one from the Stack Overflow. The Stack Overflow implementation can return the same hashCode() for entities with the same id belonging to different classes. Hash-based collections compare hash codes to determine equality first and then check for object equality using the equals() method. Therefore, the equals() method will be used for all comparisons, including comparing objects belonging to different classes, since the hash codes will be equal (when ids are the same). So, it's better to assign unique hash code values to each class to improve performance.

ChatGPT and Thorben Janssen implementations

Let's address the hashCode() method first. The method implementation proposed by ChatGPT as well as the StackOverflow implementation, returns the same hash for objects belonging to different classes in certain cases. We discovered previously why this approach is not the most optimal. The implementation proposed by Thorben Janssen returns the same hash code for all objects! This approach makes hashCode() completely meaningless.

Moving on to the equals() method, ChatGPT, and Thorben Janssen offer a more correct approach compared to previous implementations. Using getClass() instead of instanceof makes it possible to transfer the implementation to a base class without confusion when comparing two child classes. The getClass() method returns distinct values for child classes (e.g., Cat and Dog, which extend Animal), whereas instanceof provides the same output no matter the class.

Results of research

After analyzing articles from experts, examining Stack Overflow, and getting assistance from ChatGPT, we may conclude that the implementation below is the best way to override equals() and hashCode() methods.

However, this is not the case. We forget about Hibernate internals: proxies.

Making proxies great again

Hibernate offers a powerful proxy mechanism that effectively reduces load to a database by fetching data only when necessary. It is essential to understand that we may interact exclusively with proxy objects, solely with entities, or even with both simultaneously in specific scenarios.

Let's test the current implementations when a proxy object comes into play. For example, declare proxy and non-proxy objects belonging to the same database record. Then, add them to a HashSet and ensure that the size of the HashSet is equal to 1.


@Test 
void equalsAndHashCodeTest() { 
 studentRepository.save(new Student()); 
 
 Student student = studentRepository.findById(1L).orElseThrow(); 
 Student proxy = studentRepository.getReferenceById(1L); 
 
 Set<Student> students = new HashSet<>(); 
 
 students.add(student); 
 students.add(proxy); 
 
 Assertions.assertEquals(1, students.size()); 
} 

The test failed. However, the reason for this was a LazyInitializationException, not the collection's size.

Note: Later on, we'll use the term "initialization" to refer to accessing the database, loading field values, and setting it to the proxy.

In the considered implementations, the equals() method directly accessed the id field instead of using a getter method. Furthermore, none of the methods were declared as final. When using Hibernate, a proxy object is initialized whenever a non-final method is called (getId() is an exception to the rule). As a result, using non-final equals() and hashCode() methods can cause unintended proxy initialization and additional hits to the database.

To fix this, we need to access the id through a getter method and declare the methods as final:

Excellent! Now we can execute the test without an open transaction. Unfortunately, the test failed once again. This time the reason is the difference between actual and expected collection sizes:

When dealing with a proxy object, the getClass() method returns a different value than the original class. Consequently, with the current hashCode() implementation, we get different hash codes for proxy and non-proxy objects. Therefore, java placed them in different buckets inside a hash-based collection and didn’t even compare them for equality. That is why the collection size has become equal to 2.

We can use the Hibernate.getClass() method to address this issue. The documentation for this method states, "Get the true, underlying class of a proxied entity". By using this method, we can ensure accurate hash codes generation and equality comparisons in scenarios involving proxies and non-proxy objects.

The test fails once more! This time, again, due to a LazyInitializationException.

After adding the proxy to HashSet, the following call chain occurs:

The AbstractLazyInitializer.initialize() method leads to a select query to the database. It looks like we missed something. Let's take another look at the Hibernate.getClass() method documentation:

"Get the true, underlying class of a proxied entity. This operation will initialize a proxy by side effect."

There it is! This side-effect is exactly what we have encountered here!

Before Hibernate version 5.6, we could use the HibernateProxyHelper class and its getClassWithoutInitializingProxy() method instead of Hibernate.getClass(). Since Hibernate version 6, we need to use the following approach to avoid proxy initialization:


HibernateProxy.getHibernateLazyInitializer().getPersistentClass() 

Consequently, the implementation of equals() and hashCode() methods will appear as follows:

And now the test is a breeze!

The best way to implement equals and hashCode for JPA entities... so far

In conclusion, the following implementation of equals() and hashCode() seems to be the most correct and protected from various side effects.

I recommend reading the entire article for those who skipped directly to this paragraph. The article includes many interesting details about how proxying works in Hibernate and highlights potential issues with the commonly used equals() and hashCode() method implementations.


@Override 
public final boolean equals(Object o) { 
  if (this == o) return true; 
  if (o == null) return false; 
  Class<?> oEffectiveClass = o instanceof HibernateProxy ? ((HibernateProxy) o).getHibernateLazyInitializer().getPersistentClass() : o.getClass(); 
  Class<?> thisEffectiveClass = this instanceof HibernateProxy ? ((HibernateProxy) this).getHibernateLazyInitializer().getPersistentClass() : this.getClass(); 
  if (thisEffectiveClass != oEffectiveClass) return false; 
  Student student = (Student) o; 
  return getId() != null && Objects.equals(getId(), student.getId()); 
} 
 
@Override 
public final int hashCode() { 
 return this instanceof HibernateProxy 
 ? ((HibernateProxy) this).getHibernateLazyInitializer().getPersistentClass().hashCode() 
 : getClass().hashCode(); 
}
 

With this implementation, you won't encounter problems when comparing:

  1. Newly created objects not yet associated with the database;
  2. Proxy and non-proxy objects associated with the same database record;
  3. Two proxy objects associated with the same database record;
  4. Objects inherited from a class where the equals() and hashCode() implementations are defined;
  5. Two identical objects from different sessions (obtained from different transactions).

Using Spring Data JPA, Hibernate or EclipseLink and code in IntelliJ IDEA? Make sure you are ultimately productive with the JPA Buddy plugin!

It will always give you a valuable hint and even generate the desired piece of code for you: JPA entities and Spring Data repositories, Liquibase changelogs and Flyway migrations, DTOs and MapStruct mappers and even more!

Additionally, you won't encounter any unintended proxy initialization when you invoke equals() either implicitly or explicitly. Consequently, the performance of your application won't be affected.

To avoid remembering the implementation details, install the JPA Buddy plugin and generate the correct methods for any of your entities with just a few clicks.