Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alluxio worker OOMKilled #18681

Open
XiXiTan opened this issue Sep 4, 2024 · 1 comment
Open

alluxio worker OOMKilled #18681

XiXiTan opened this issue Sep 4, 2024 · 1 comment
Labels
type-bug This issue is about a bug

Comments

@XiXiTan
Copy link

XiXiTan commented Sep 4, 2024

Alluxio Version:
What version of Alluxio are you using?
2.9.0.1

Describe the bug
A clear and concise description of what the bug is.
内存设置有富裕,但worker pod会出现被OOMKilled情况。
请教可能是哪块儿内存使用超出预期?以及缓存为啥会用超过设置的取值?

pod申请资源: cpu: 4 memory: 16G
使用资源: xmx=4g MaxDirectMemorySize=4g alluxio.worker.ramdisk.size=6g 预留内存=2g

具体内存设置:
/usr/lib/jvm/java-1.8.0-openjdk/bin/java -cp /opt/alluxio-2.9.0.1-noHelm/conf/::/opt/alluxio/ranger-lib/*:/opt/alluxio-2.9.0.1-noHelm/assembly/alluxio-server-2.9.0.1.jar -Dalluxio.logger.type=Console,WORKER_LOGGER -Dsun.security.krb5.disableReferrals=true -Dalluxio.home=/opt/alluxio-2.9.0.1-noHelm -Dalluxio.conf.dir=/opt/alluxio-2.9.0.1-noHelm/conf -Dalluxio.logs.dir=/opt/alluxio-2.9.0.1-noHelm/logs -Dalluxio.user.logs.dir=/opt/alluxio-2.9.0.1-noHelm/logs/user -Dlog4j.configuration=file:/opt/alluxio-2.9.0.1-noHelm/conf/log4j.properties -Dorg.apache.jasper.compiler.disablejsr199=true -Djava.net.preferIPv4Stack=true -Dorg.apache.ratis.thirdparty.io.netty.allocator.useCacheForAllThreads=false -Dalluxio.worker.hostname=ip -Xmx4096M -XX:MaxDirectMemorySize=4096M alluxio.worker.AlluxioWorker

conf/alluxio-site.properties
alluxio.worker.ramdisk.size=6144M

缓存使用:

缓存使用

出问题pod的cpu、mem情况
worker cpu

worker mem

To Reproduce
Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior
A clear and concise description of what you expected to happen.
worker pod不要OOMKilled

Urgency
Describe the impact and urgency of the bug.

Are you planning to fix it
Please indicate if you are already working on a PR.

Additional context
Add any other context about the problem here.

@XiXiTan XiXiTan added the type-bug This issue is about a bug label Sep 4, 2024
@XiXiTan
Copy link
Author

XiXiTan commented Sep 4, 2024

另一个小问题:
如果worker缓存设置为512M,实际会使用1024M。这超过了缓存设置512,和woker缓存使用预期不符。
MEM HDD
capacity 30.50GB 512.00MB 30.00GB
used 4083.94MB (13%) 1024.00MB 3059.94MB

源码中只看到对于未设定缓存时的默认值,会取系统获取2/3内存或者给1g。没有看到对于指定缓存时,会取其他值的逻辑。

`

public static final PropertyKey WORKER_RAMDISK_SIZE =
dataSizeBuilder(Name.WORKER_RAMDISK_SIZE)
.setAlias(Name.WORKER_MEMORY_SIZE)
.setDefaultSupplier(() -> {
try {
OperatingSystemMXBean operatingSystemMXBean =
(OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
return operatingSystemMXBean.getTotalPhysicalMemorySize() * 2 / 3;
} catch (Throwable e) {
// The package com.sun.management may not be available on every platform.
// fallback to a reasonable size.
return "1GB";
}
}, "2/3 of total system memory, or 1GB if system memory size cannot be determined")
.setDescription("The allocated memory for each worker node's ramdisk(s). "
+ "It is recommended to set this value explicitly.")
.setConsistencyCheckLevel(ConsistencyCheckLevel.WARN)
.setScope(Scope.WORKER)
.build();

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

1 participant