0%

Do we need to configure socketTimeout when using HikariCP

Scenario

HikariCP + mariadb-java-client + MySQL

Although the network or database has been restored, the application will not restore itself, only after a restart.

1
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.

Thread Dump

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"HikariPool-1 connection adder" ... runnable ...
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
...

"xxx_QuartzSchedulerThread" ... waiting on condition ...
java.lang.Thread.State: TIMED_WAITING (parking)
...
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at com.zaxxer.hikari.util.ConcurrentBag.borrow(ConcurrentBag.java:157)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:173)
...

"HikariPool-1 housekeeper" ... waiting on condition ...
java.lang.Thread.State: TIMED_WAITING (parking)
...

How HikariCP creates connections

main thread call stack as follows:

1
2
3
4
|-com.zaxxer.hikari.HikariDataSource#getConnection()
|-com.zaxxer.hikari.poolHikariPool#getConnection(hardTimeout)
|-com.zaxxer.hikari.util.ConcurrentBag#borrow(timeout, timeUnit)
|-com.zaxxer.hikari.pool.HikariPool#addBagItem(waiting)

addConnectionExecutor is responsible for creating connections, but it is a thread pool with only one thread, which means creating connections is blocking and needs to be queued.

1
2
3
4
5
6
7
8
9
10
11
12
// com.zaxxer.hikari.pool.HikariPool
...
@Override
public void addBagItem(final int waiting)
{
final boolean shouldAdd = waiting - addConnectionQueue.size() >= 0;
if (shouldAdd) {
// corePoolSize=1, maximumPoolSize=1, workQueue=LinkedBlockingQueue
addConnectionExecutor.submit(POOL_ENTRY_CREATOR);
}
}
...

HikariPool-1 connection adder thread call stack as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|-com.zaxxer.hikari.pool.HikariPool.PoolEntryCreator#call()
|-com.zaxxer.hikari.pool.HikariPool#createPoolEntry()
|-com.zaxxer.hikari.pool.PoolBase#newPoolEntry()
|-com.zaxxer.hikari.pool.PoolBase#newConnection()
|-com.zaxxer.hikari.util.DriverDataSource#getConnection()
|-org.mariadb.jdbc.Driver#connect(url, props)
|-org.mariadb.jdbc.MariaDbConnection#newConnection(urlParser, globalInfo)
|-org.mariadb.jdbc.internal.util.Utils#retrieveProxy(urlParser, globalInfo)
|-org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#connectWithoutProxy()
|-org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#createConnection(hostAddress, username)
|-org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#createSocket(host, port, options)
|-org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket#ReadInitialHandShakePacket(reader)
|-org.mariadb.jdbc.internal.io.input.StandardPacketInputStream#getPacket(reUsable)
|-org.mariadb.jdbc.internal.io.input.StandardPacketInputStream#getPacketArray(reUsable)
|-org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream#read(externalBuf, off, len)
|-org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream#fillBuffer(minNeededBytes)
|-java.io.FilterInputStream#read(b, off, len)
|-java.net.SocketInputStream#read(b, off, length)
|-java.net.SocketInputStream#read(b, off, length, timeout)
|-java.net.SocketInputStream#socketRead(fd, b, off, len, timeout)
|-java.net.SocketInputStream#socketRead0(fd, b, off, len, timeout)

org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#createSocket mehtod will establish a TCP connection through java.net.Socket.

1
2
3
4
5
xxx@xxx ~ % lsof -i tcp:3306
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 10015 chensongyu 95u IPv6 0xe5897620744b2967 0t0 TCP localhost:64819->localhost:mysql (ESTABLISHED)
com.docke 69423 chensongyu 115u IPv6 0xe5897620744c2967 0t0 TCP *:mysql (LISTEN)
com.docke 69423 chensongyu 122u IPv6 0xe5897620744c3047 0t0 TCP localhost:mysql->localhost:64819 (ESTABLISHED)

The default socketTimeout is 0, which means no timeout, and the default connectTimeout is 30s.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol
...
private static Socket createSocket(final String host, final int port, final Options options)
throws SQLException {
Socket socket;
try {
// ...
// socketTimeout default is zero
if (options.socketTimeout != null) {
socket.setSoTimeout(options.socketTimeout);
}
// ...
try {
// connectTimeout default is 30_000ms
socket.connect(sockAddr, options.connectTimeout);
} catch (IOException ioe) {
throw ioe;
}
// ...
}
...

After the connection is established, parse the server greeting packet.

1
2
3
4
5
6
7
8
9
10
// org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket
...
public ReadInitialHandShakePacket(final PacketInputStream reader)
throws IOException, SQLException {
Buffer buffer = reader.getPacket(true);
if (buffer.getByteAt(0) == ERROR) {
ErrorPacket errorPacket = new ErrorPacket(buffer);
throw new SQLException(errorPacket.getMessage());
}
...

What went wrong

Consider the network or database to be unstable at some point, your application thread will get stuck in this java.net.SocketInputStream#socketRead0() API until it has completely read the response data. What's more serious is that addConnectionExecutor has only one thread, which will prevent the task POOL_ENTRY_CREATOR from creating new connections, even if the network or database has returned to normal.

How to fix

Configure socketTimeout on spring.datasource.url property.

1
2
3
spring:
datasource:
url: jdbc:mariadb://localhost:3306/xxx?socketTimeout=60000

Update

Faster PostgreSQL connection recovery

Rapid Recovery

Unacknowledged TCP

The reason that HikariCP is powerless to recover connections that are out of the pool is due to unacknowledged TCP traffic. TCP is a synchronous communication scheme that requires "handshaking" from both sides of the connection as packets are exchanged (SYN and ACK packets).

When TCP communication is abruptly interrupted, the client or server can be left awaiting the acknowledgement of a packet that will never come. The connection is therefore "stuck", until an operating system level TCP timeout occurs. This can be as long as several hours, depending on the operating system TCP stack tuning.

TCP Timeouts

In order to avoid this condition, it is imperative that the application configures the driver-level TCP socket timeout . Each driver differs in how this timeout is set, but nearly all drivers support it.

HikariCP recommends that the driver-level socket timeout be set to (at least) 2-3x the longest running SQL transaction, or 30 seconds, whichever is longer. However, your own recovery time targets should determine the appropriate timeout for your application.

See the specific database sections below for some common configurations.