0%

Scenario

HikariCP + mariadb-java-client + MySQL

Although the network or database has been restored, the application will not restore itself, only after a restart.

1
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms.

Thread Dump

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"HikariPool-1 connection adder" ... runnable ...
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
...

"xxx_QuartzSchedulerThread" ... waiting on condition ...
java.lang.Thread.State: TIMED_WAITING (parking)
...
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)
at com.zaxxer.hikari.util.ConcurrentBag.borrow(ConcurrentBag.java:157)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:173)
...

"HikariPool-1 housekeeper" ... waiting on condition ...
java.lang.Thread.State: TIMED_WAITING (parking)
...

How HikariCP creates connections

main thread call stack as follows:

1
2
3
4
|-com.zaxxer.hikari.HikariDataSource#getConnection()
|-com.zaxxer.hikari.poolHikariPool#getConnection(hardTimeout)
|-com.zaxxer.hikari.util.ConcurrentBag#borrow(timeout, timeUnit)
|-com.zaxxer.hikari.pool.HikariPool#addBagItem(waiting)

addConnectionExecutor is responsible for creating connections, but it is a thread pool with only one thread, which means creating connections is blocking and needs to be queued.

1
2
3
4
5
6
7
8
9
10
11
12
// com.zaxxer.hikari.pool.HikariPool
...
@Override
public void addBagItem(final int waiting)
{
final boolean shouldAdd = waiting - addConnectionQueue.size() >= 0;
if (shouldAdd) {
// corePoolSize=1, maximumPoolSize=1, workQueue=LinkedBlockingQueue
addConnectionExecutor.submit(POOL_ENTRY_CREATOR);
}
}
...

HikariPool-1 connection adder thread call stack as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|-com.zaxxer.hikari.pool.HikariPool.PoolEntryCreator#call()
|-com.zaxxer.hikari.pool.HikariPool#createPoolEntry()
|-com.zaxxer.hikari.pool.PoolBase#newPoolEntry()
|-com.zaxxer.hikari.pool.PoolBase#newConnection()
|-com.zaxxer.hikari.util.DriverDataSource#getConnection()
|-org.mariadb.jdbc.Driver#connect(url, props)
|-org.mariadb.jdbc.MariaDbConnection#newConnection(urlParser, globalInfo)
|-org.mariadb.jdbc.internal.util.Utils#retrieveProxy(urlParser, globalInfo)
|-org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#connectWithoutProxy()
|-org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#createConnection(hostAddress, username)
|-org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#createSocket(host, port, options)
|-org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket#ReadInitialHandShakePacket(reader)
|-org.mariadb.jdbc.internal.io.input.StandardPacketInputStream#getPacket(reUsable)
|-org.mariadb.jdbc.internal.io.input.StandardPacketInputStream#getPacketArray(reUsable)
|-org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream#read(externalBuf, off, len)
|-org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream#fillBuffer(minNeededBytes)
|-java.io.FilterInputStream#read(b, off, len)
|-java.net.SocketInputStream#read(b, off, length)
|-java.net.SocketInputStream#read(b, off, length, timeout)
|-java.net.SocketInputStream#socketRead(fd, b, off, len, timeout)
|-java.net.SocketInputStream#socketRead0(fd, b, off, len, timeout)

org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol#createSocket mehtod will establish a TCP connection through java.net.Socket.

1
2
3
4
5
xxx@xxx ~ % lsof -i tcp:3306
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 10015 chensongyu 95u IPv6 0xe5897620744b2967 0t0 TCP localhost:64819->localhost:mysql (ESTABLISHED)
com.docke 69423 chensongyu 115u IPv6 0xe5897620744c2967 0t0 TCP *:mysql (LISTEN)
com.docke 69423 chensongyu 122u IPv6 0xe5897620744c3047 0t0 TCP localhost:mysql->localhost:64819 (ESTABLISHED)

The default socketTimeout is 0, which means no timeout, and the default connectTimeout is 30s.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol
...
private static Socket createSocket(final String host, final int port, final Options options)
throws SQLException {
Socket socket;
try {
// ...
// socketTimeout default is zero
if (options.socketTimeout != null) {
socket.setSoTimeout(options.socketTimeout);
}
// ...
try {
// connectTimeout default is 30_000ms
socket.connect(sockAddr, options.connectTimeout);
} catch (IOException ioe) {
throw ioe;
}
// ...
}
...

After the connection is established, parse the server greeting packet.

1
2
3
4
5
6
7
8
9
10
// org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket
...
public ReadInitialHandShakePacket(final PacketInputStream reader)
throws IOException, SQLException {
Buffer buffer = reader.getPacket(true);
if (buffer.getByteAt(0) == ERROR) {
ErrorPacket errorPacket = new ErrorPacket(buffer);
throw new SQLException(errorPacket.getMessage());
}
...

What went wrong

Consider the network or database to be unstable at some point, your application thread will get stuck in this java.net.SocketInputStream#socketRead0() API until it has completely read the response data. What's more serious is that addConnectionExecutor has only one thread, which will prevent the task POOL_ENTRY_CREATOR from creating new connections, even if the network or database has returned to normal.

How to fix

Configure socketTimeout on spring.datasource.url property.

1
2
3
spring:
datasource:
url: jdbc:mariadb://localhost:3306/xxx?socketTimeout=60000

Update

Faster PostgreSQL connection recovery

Rapid Recovery

Unacknowledged TCP

The reason that HikariCP is powerless to recover connections that are out of the pool is due to unacknowledged TCP traffic. TCP is a synchronous communication scheme that requires "handshaking" from both sides of the connection as packets are exchanged (SYN and ACK packets).

When TCP communication is abruptly interrupted, the client or server can be left awaiting the acknowledgement of a packet that will never come. The connection is therefore "stuck", until an operating system level TCP timeout occurs. This can be as long as several hours, depending on the operating system TCP stack tuning.

TCP Timeouts

In order to avoid this condition, it is imperative that the application configures the driver-level TCP socket timeout . Each driver differs in how this timeout is set, but nearly all drivers support it.

HikariCP recommends that the driver-level socket timeout be set to (at least) 2-3x the longest running SQL transaction, or 30 seconds, whichever is longer. However, your own recovery time targets should determine the appropriate timeout for your application.

See the specific database sections below for some common configurations.

HikariCP keepaliveTime

keepaliveTime

This property controls how frequently HikariCP will attempt to keep a connection alive, in order to prevent it from being timed out by the database or network infrastructure. This value must be less than the maxLifetime value. A "keepalive" will only occur on an idle connection. When the time arrives for a "keepalive" against a given connection, that connection will be removed from the pool, "pinged", and then returned to the pool. The 'ping' is one of either: invocation of the JDBC4 isValid() method, or execution of the connectionTestQuery. Typically, the duration out-of-the-pool should be measured in single digit milliseconds or even sub-millisecond, and therefore should have little or no noticeable performance impact. The minimum allowed value is 30000ms (30 seconds), but a value in the range of minutes is most desirable. Default: 0 (disabled)

Always check the connection availability when calling getConnection()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// com.zaxxer.hikari.pool.HikariPool
...
private final long aliveBypassWindowMs = Long.getLong("com.zaxxer.hikari.aliveBypassWindowMs", MILLISECONDS.toMillis(500));
...
public Connection getConnection(final long hardTimeout) throws SQLException
{
...
do {
PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
if (poolEntry == null) {
break;
}

final long now = currentTime();
if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > aliveBypassWindowMs && !isConnectionAlive(poolEntry.connection))) {
closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
timeout = hardTimeout - elapsedMillis(startTime);
}
else {
metricsTracker.recordBorrowStats(poolEntry, startTime);
return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
}
} while (timeout > 0L);
...
}
...

csongyu/health-indicator-routing-data-source

DB Unknown Status

javax.sql.DataSource 接口的实现类为 org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource 的子类时,调用 /actuator/health 接口返回 DB 的健康状态为 UNKNOWN。

1
{"status":"UP","details":{"db":{"status":"UNKNOWN"},"diskSpace":{"status":"UP","details":{"total":494384795648,"free":391930544128,"threshold":10485760}}}}

原因在于:

org.springframework.boot.actuate.autoconfigure.jdbc.DataSourceHealthIndicatorAutoConfiguration

1
2
3
4
5
6
7
8
9
10
11
12
private Map<String, DataSource> filterDataSources(Map<String, DataSource> candidates) {
if (candidates == null) {
return null;
}
Map<String, DataSource> dataSources = new LinkedHashMap<>();
candidates.forEach((name, dataSource) -> {
if (!(dataSource instanceof AbstractRoutingDataSource)) {
dataSources.put(name, dataSource);
}
});
return dataSources;
}

org.springframework.boot.actuate.jdbc.DataSourceHealthIndicator

1
2
3
4
5
6
7
8
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
if (this.dataSource == null) {
builder.up().withDetail("database", "unknown");
} else {
doDataSourceHealthCheck(builder);
}
}

自定义 Health Indicator

通过 org.springframework.jdbc.core.JdbcTemplate 执行 SELECT 1 FROM DUAL 语句检查 DB 的健康状态。

xyz.csongyu.healthindicator.RoutingDataSourceHealthIndicator

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@PostConstruct
@SuppressWarnings("unchecked")
public void init() throws NoSuchFieldException, IllegalAccessException {
final Field field = AbstractRoutingDataSource.class.getDeclaredField("resolvedDataSources");
field.setAccessible(true);
final Map<Object, DataSource> resolvedDataSources = (Map<Object, DataSource>)field.get(this.dataSource);
this.jdbcTemplates = resolvedDataSources.entrySet().stream()
.collect(Collectors.toMap(Map.Entry::getKey, entry -> new JdbcTemplate(entry.getValue())));
}

@Override
public Health health() {
final List<Health> results = this.jdbcTemplates.entrySet().stream().map(entry -> {
final JdbcTemplate jdbcTemplate = entry.getValue();
try {
jdbcTemplate.queryForObject("SELECT 1 FROM DUAL", String.class);
return Health.up().withDetail("dataSource", entry.getKey()).build();
} catch (final DataAccessException e) {
return Health.down().withDetail("dataSource", entry.getKey()).withException(e).build();
}
}).collect(Collectors.toList());

...
}
1
{"status":"UP","details":{"routingDataSource":{"status":"UP","details":{"DATA_SOURCE_B":"UP","DATA_SOURCE_A":"UP"}},"diskSpace":{"status":"UP","details":{"total":494384795648,"free":391919927296,"threshold":10485760}}}}

jstack

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"pool-1-thread-1" #19 prio=5 os_prio=0 cpu=4292.67ms elapsed=6105.93s tid=0x00007f7bb1a83670 nid=0x1f63b2 waiting on condition  [0x00007f7b748fa000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.1/Native Method)
- parking to wait for <0x00000000c7541680> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.1/LockSupport.java:341)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@17.0.1/AbstractQueuedSynchronizer.java:506)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.1/ForkJoinPool.java:3463)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.1/ForkJoinPool.java:3434)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@17.0.1/AbstractQueuedSynchronizer.java:1623)
at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:393)
at org.apache.http.pool.AbstractConnPool.access$300(AbstractConnPool.java:70)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:253)
- locked <0x00000000c89eb980> (a org.apache.http.pool.AbstractConnPool$2)
at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:198)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:306)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.http.client.fluent.Request.internalExecute(Request.java:173)
at org.apache.http.client.fluent.Request.execute(Request.java:177)
at xyz.csongyu.smartinvesttask.configuration.FundOpenFundInfoEmTaskConfiguration.lambda$unitNetAssetValueRunner$1(FundOpenFundInfoEmTaskConfiguration.java:85)
at xyz.csongyu.smartinvesttask.configuration.FundOpenFundInfoEmTaskConfiguration$$Lambda$567/0x0000000800f074c0.get(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(java.base@17.0.1/CompletableFuture.java:1768)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.1/ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.1/ThreadPoolExecutor.java:635)
at java.lang.Thread.run(java.base@17.0.1/Thread.java:833)

Code Example

csongyu/apache-hc-fluent-api

1
2
3
4
5
6
7
8
9
10
11
12
final List<Integer> indexes = Stream.iterate(0, item -> item + 1).limit(101).collect(Collectors.toList());  
final ExecutorService executorService = Executors.newFixedThreadPool(10);
CompletableFuture.allOf(indexes.stream().map(index -> CompletableFuture.supplyAsync(() -> {
try {
Request.Get("http://127.0.0.1:8088/index/" + index).connectTimeout(1_000).socketTimeout(5_000)
.addHeader("Accept", "application/json").execute()
.saveContent(directory.resolve(index + ".json").toFile());
} catch (final IOException e) {
return false;
}
return Files.exists(Paths.get(index + ".json"));
}, executorService)).toArray(CompletableFuture[]::new)).orTimeout(30, TimeUnit.SECONDS).join();

Default Pooling Connection Manager

Class Executor

A PoolingHttpClientConnectionManager with maximum 100 connections per route and a total maximum of 200 connections is used internally.

How to Fix

execute

Please Note that response content must be processed or discarded using Response.discardContent(), otherwise the connection used for the request might not be released to the pool.

discardContent

Discards response content and deallocates all resources associated with it.

1
2
3
4
5
6
7
8
9
10
11
12
13
Response response = null;
try {
response = Request.Get("http://127.0.0.1:8088/index/" + index).connectTimeout(1_000)
.socketTimeout(5_000).addHeader("Accept", "application/json").execute();
response.saveContent(directory.resolve(index + ".json").toFile());
} catch (final IOException e) {
return false;
} finally {
if (response != null) {
// important
response.discardContent();
}
}

域名解析

主机记录 记录类型 记录值
blog A x.x.x.x

编辑 docker-compose.yml

Quickstart: Compose and WordPress

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
version: "3.9"

services:
wordpress-db:
image: mariadb:10.7
volumes:
- /mnt/wordpress-db-data:/var/lib/mysql
restart: always
environment:
MARIADB_RANDOM_ROOT_PASSWORD: "1"
MARIADB_DATABASE: wordpress
MARIADB_USER: wordpress
MARIADB_PASSWORD: wordpress.123
networks:
- wordpress-network
wordpress:
image: wordpress:5.9
volumes:
- /mnt/wordpress-data:/var/www/html
expose:
- 80
restart: always
environment:
WORDPRESS_DB_HOST: wordpress-db
WORDPRESS_DB_USER: wordpress
WORDPRESS_DB_PASSWORD: wordpress.123
WORDPRESS_DB_NAME: wordpress
networks:
- wordpress-network

networks:
wordpress-network:
name: wordpress-network
driver: bridge

Nginx 添加站点

conf.d/blog.<domain>.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
server {
# Serve Content Over IPv4 and IPv6
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name blog.<domain>;

location / {
proxy_pass http://wordpress:80;

proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}

docker-compose.yml

1
2
3
4
5
6
7
...
networks:
- wordpress-network

networks:
wordpress-network:
external: true

运行 WordPress

1
2
docker compose up -d
# https://blog.<domain>/