October 17, 2015

JPA query languages compared (part 1 of 4)



Accessing persistent storage with Java EE 7 comes with the choice of the right query language. This article aims to give you an overview as well as a comparison of using SQL, JPQL, the Criteria API and Querydsl for writing database queries.

JPA basics

Feel free to skip this introductory section if you’re familiar with the JPA specification and its configuration already.

The Java EE 7 umbrella specification provides two complementary standards dealing with persistent storage: JPA and JTA.
  • JPA (Java Java Persistence) 2.1 (JSR 338, 2013) abstracts DB access, providing object-relational mapping (OR/M) and domain specific languages (DSLs) to write DB queries. As of V. 2.0, the reference implementation is EclipseLink (previously, it was Hibernate).
  • JTA (Java Transaction API) 1.2 (JSR 907, 2013) abstracts transaction control, allowing declarative control over ACID properties. The reference implementation is GlassFish. JTA is not covered by this article.

Query languages

This article concentrates on how to write DB queries (the Java equivalent to SQL queries) using the query DSLs provides by JPA. In fact, JPA provides many means to write DB queries:
  • Dynamic native queries (optionally as TypedQuery) in SQL
  • Dynamic queries (optionally as TypedQuery) in JPQL
  • Named native queries (optionally as TypedQuery) in SQL
  • Named queries (optionally as TypedQuery) in JQPL
  • String-based or strongly typed Criteria API queries
There are also a few third party frameworks which provide their own query DSL for use within a JPA environment. In my experience, this one is within the most frequently used:
In practice, I often see a wild mix of these approaches, resulting in inconsistent code and a confusing technology mismatch. Of course, each of these approaches has their advantages and disadvantages, and choosing the right tool for the right job is key to create a consistent code base.

Getting started with JPA: Configuration

TL;DR: Feel free to take a look at the persistence.xml file of the demo project.

In order to get started with a JPA persistence project, you need the org.eclipse.persistence:eclipselink dependency or any other JPA implementation on the PATH, and configure a persistence-unit in persistence.xml. The NetBeans IDE provides a wizard to create a persistence unit. You may also want to refer to other online tutorials.

For a real-world JEE project, you’ll typically use transaction-type="JTA" for implicit, annotation-based transaction handling. Outside of the container, e.g. in unit tests, you’ll have to use a transaction-type="RESOURCE_LOCAL" persistence-unit.

Also, you’ll typically want to log the actual SQL statements. The configuration is JPA implementation specific. For EclipseLink, it is:
<property name="eclipselink.logging.level.sql" value="FINE"/>
<property name="eclipselink.logging.parameters" value="true"/>
This will print SQL statements as well as resolve query parameters (otherwise printed as ‘?’).

CRUD with JPA

For simple create / read by id / update / delete DB operations, there is no need to create any DB queries at all. Instead, use the EntityManager’s API to operate on an @Entity using plain OR/M. Hence, a basic abstract CrudService can be implemented like this:
@TransactionAttribute(TransactionAttributeType.MANDATORY)
public abstract class BaseService<T extends BaseModel> {
    @PersistenceContext
    EntityManager em;
    
    public abstract T create();
    
    public T findById(Long id) {
        return em.find(getModelClass(), id);
    }
    
    public List<T> findAll() {
        CriteriaQuery<T> query = em.getCriteriaBuilder().createQuery(getModelClass());
        query.select(query.from(getModelClass()));
        return (List<T>) em.createQuery(query).getResultList();
    }
    
    public long countAll() {
        CriteriaQuery<Long> query = em.getCriteriaBuilder().createQuery(Long.class);
        query.select(em.getCriteriaBuilder().count(query.from(getModelClass())));
        return em.createQuery(query).getSingleResult();
    }
    
    public T save(T entity) {        
        if (entity.getId() == null) {
            em.persist(entity);
        }
        else {
            entity = em.merge(entity);
        }
        em.flush();
        return entity;
    }
    
    public void delete(Long id) {
        T entity = em.getReference(getModelClass(), id);
        em.remove(entity);
    }
    
    protected abstract Class<T> getModelClass();
}
Building a concrete service, e.g. for an entity of type Customer, is then as simple as
public class CustomerService extends BaseService<Customer> {
    @Override
    public Customer create() {
        return new Customer();
    }

    @Override
    protected Class<Customer> getModelClass() {
        return Customer.class;
    }
}
A few notes:
  • findAll() and countAll() operations are not supported by the EntityManager API. Thus, they are here implemented using Criteria API queries. Details will follow.
  • Why building a dedicated create() method and not just invoke the entity’s constructor in the client code? Because that way, we could swap the service implementation with a service which uses a so-called extended persistence context which “magically” keeps an entity attached. This is what e.g. Adam Bien refers to as the JEE “gateway” pattern. Whether building a gateway is the right choice depends on the use case; its discussion would exceed the scope of this article.
On the entity side, thanks to OR/M, we typically make excessive use of relations cascading, e.g. when saving a parent, have changes to its children saved as well. This is accomplished using the respective JPA annotations in the entity class:
@OneToMany(mappedBy = "customer", cascade = {CascadeType.PERSIST, CascadeType.MERGE}, orphanRemoval = true)
private List<Purchase> purchases = new ArrayList<>();
In fact, we will never use explicit DB queries for CRUD operations, but use these EntityManager operations instead. Hence, the queries discussed in this article are all read-only find by / count by + parameter queries.

For performance reasons, you may want to use bulk update queries as well, but this should really be well justified (keeping in mind that premature optimization is the root of all evil).

Example queries

Throughout this article, I will use this simple business model to demonstrate query building:

  • A customer has multiple purchases (bidirectional).
  • A purchase has multiple products and vise versa (unidirectional from purchase to products).
Here’s how it would look like in a table structure:
CUSTOMER
ID NAME PREMIUM
PURCHASE
ID ORDERDATE CUSTOMER_ID
PRODUCT
ID NAME PRICE
PURCHASE_PRODUCT
PURCHASE_ID PRODUCTS_ID

For each DB query mechanism, I will show how to build the example queries presented in the following table (here, their plain SQL implementation is shown):
1. Find customer by id 2. Find by id (generic)
SELECT * FROM CUSTOMER WHERE (ID = 1)
SELECT * FROM x WHERE (ID = 1)
(where x is the table name provided; not actually supported by plain SQL)
3. Find all customers 4. Count all customers
SELECT * FROM CUSTOMER
SELECT COUNT(ID) FROM CUSTOMER
5. Find purchase by customer id 6. Count purchase by customer id
SELECT PURCHASE.* FROM PURCHASE
INNER JOIN CUSTOMER
ON PURCHASE.CUSTOMER_ID=CUSTOMER.ID
WHERE CUSTOMER.ID = 1
SELECT COUNT(PURCHASE.ID) FROM PURCHASE
INNER JOIN CUSTOMER
ON PURCHASE.CUSTOMER_ID=CUSTOMER.ID
WHERE CUSTOMER.ID = 1
7. Find distinct product with purchase’s customer’s special condition 8. Aggregate product price sum by purchase’s customers
SELECT PRODUCT.* FROM PRODUCT
WHERE PRODUCT.ID IN (
  SELECT DISTINCT PRODUCT.ID FROM PRODUCT
  INNER JOIN PURCHASE_PRODUCT
  ON PURCHASE_PRODUCT.PRODUCTS_ID=PRODUCT.ID
  INNER JOIN PURCHASE
  ON PURCHASE.ID=PURCHASE_PRODUCT.PURCHASE_ID
  INNER JOIN CUSTOMER
  ON PURCHASE.CUSTOMER_ID=CUSTOMER.ID
  WHERE CUSTOMER.PREMIUM = 1
)
SELECT CUSTOMER.ID, SUM(PRODUCT.PRICE) FROM CUSTOMER
INNER JOIN PURCHASE
ON PURCHASE.CUSTOMER_ID=CUSTOMER.ID
INNER JOIN PURCHASE_PRODUCT
ON PURCHASE.ID=PURCHASE_PRODUCT.PURCHASE_ID
INNER JOIN PRODUCT
ON PURCHASE_PRODUCT.PRODUCTS_ID=PRODUCT.ID
GROUP BY CUSTOMER.ID

The accompanying GitHub project contains all the code examples in this article as a unit test which you can execute against your local database.
Pages: 1 2 3 4