eureka在关闭自我保护情况下过期服务不及时下线问题跟踪
1、微服务中使用eureka注册中心的三种角色
  • eureka server注册中心

注册中心作为存储服务,提供各种服务提供方注册的服务信息,eureka正常启动后可使用http://localhost:8761/查看已注册的服务信息及相关配置信息,http://localhost:8761/eureka/apps/feign-service可查看某个服务的明细信息,其中feign-service即服务的名称。

eureka:
  server:
    enable-self-preservation: false
    eviction-interval-timer-in-ms: 10000

参数enable-self-preservation为是否开启eureka自我保护参数,eviction-interval-timer-in-ms为清除过期服务时间间隔,如设置成每10秒清除一次过期服务,启动eureka注册中心服务后每间隔10秒会有Running the evict task with compensationTime 3ms日志;

  • 服务提供方

将服务信息注册到eureka注册中心,并可配置维持心跳检测(续期)时间((默认30秒)及过期时间(默认90秒)。

eureka:
  client:
    register-with-eureka: true
    fetch-registry: true
  instance:
    lease-renewal-interval-in-seconds: 1
    lease-expiration-duration-in-seconds: 10

register-with-eureka表示是否将服务注册到eureka中,服务提供方应设置为truefetch-registry表示是否从注册中心获取配置,服务消费方应配置为truelease-renewal-interval-in-seconds表示续期时间间隔,即1秒心跳检测一次,为了演示方便调短时间;lease-expiration-duration-in-seconds续期超时时间,即超过此时间后服务检测不到心跳即过期。

  • 服务消费方

从注册中心获取需要调用的服务提供方相关信息

2、跟踪失效服务的清除

将注册中心的清除时间间隔eviction-interval-timer-in-ms改为1000,即每秒钟触发一次清除过期服务;

将服务提供方心跳检测时间lease-renewal-interval-in-seconds改为1,每秒续一次时间,并将过期时间lease-expiration-duration-in-seconds设置为10,即应该在10秒后没有收到心跳即过期,启动服务后可在注册中心查看相关的参数是否生效http://localhost:8761/eureka/apps/feign-service,在instance->leaseInfo中可看到以下配置;

<renewalIntervalInSecs>1</renewalIntervalInSecs>
<durationInSecs>10</durationInSecs>

按照以上的设置,如果服务提供方在6:00:00的时间挂掉的话,按eureka每秒清除一次过期服务,且服务提供方的续期过期时间10秒,服务应该在6:00:11左右的时间清除掉,而实际上服务却是在6:00:21左右的时间点才清除;经过反复的测试验证没有按这个预期内的时间清除掉这个服务;

最后在源码中找到答案

查看类com.netflix.eureka.registry.AbstractInstanceRegistry中的public void evict(long additionalLeaseMs)方法,因为演示中只注册了一个服务,所以省略掉其他非关键部分如下

// We collect first all expired items, to evict them in random order. For large eviction sets,
// if we do not that, we might wipe out whole apps before self preservation kicks in. By randomizing it,
// the impact should be evenly distributed across all applications.
List<Lease<InstanceInfo>> expiredLeases = new ArrayList<>();
for (Entry<String, Map<String, Lease<InstanceInfo>>> groupEntry : registry.entrySet()) {
    Map<String, Lease<InstanceInfo>> leaseMap = groupEntry.getValue();
    if (leaseMap != null) {
        for (Entry<String, Lease<InstanceInfo>> leaseEntry : leaseMap.entrySet()) {
            Lease<InstanceInfo> lease = leaseEntry.getValue();
            if (lease.isExpired(additionalLeaseMs) && lease.getHolder() != null) {
                expiredLeases.add(lease);
            }
        }
    }
}

大概意思是收集到全部过期的服务,并随机清除,其中最关键的部分isExpired判断如下

/**
 * Checks if the lease of a given {@link com.netflix.appinfo.InstanceInfo} has expired or not.
 *
 * Note that due to renew() doing the 'wrong" thing and setting lastUpdateTimestamp to +duration more than
 * what it should be, the expiry will actually be 2 * duration. This is a minor bug and should only affect
 * instances that ungracefully shutdown. Due to possible wide ranging impact to existing usage, this will
 * not be fixed.
 *
 * @param additionalLeaseMs any additional lease time to add to the lease evaluation in ms.
 */
public boolean isExpired(long additionalLeaseMs) {
    return (evictionTimestamp > 0 || System.currentTimeMillis() > (lastUpdateTimestamp + duration + additionalLeaseMs));
}

这里解析了renew()这个方法做了错误的事情,使用lastUpdateTimestamp加了duration,所以实际的过期是2倍的duration时间,接下来看看类com.netflix.eureka.lease.Lease中的renew()方法;

/**
 * Renew the lease, use renewal duration if it was specified by the
 * associated {@link T} during registration, otherwise default duration is
 * {@link #DEFAULT_DURATION_IN_SECS}.
 */
public void renew() {
    lastUpdateTimestamp = System.currentTimeMillis() + duration;

}

果然是在最后更新时间中增加了duration,在演示中去掉这个duration即符合预期了

通常情况下如果我们需要平滑升级服务,需要先将原来的微服务逐个标识成OUT_OF_SERVICEcurl -X PUT http://root:password@localhost:8761/eureka/apps/feign-service/localhost:feign-service:8888/status?value=OUT_OF_SERVICE,在下一个续约周期中将会更新服务提供方的实例信息,再将此服务下线,从而避免因服务的升级造成服务不可用。


赞赏(Donation)
微信(Wechat Pay)

donation-wechatpay