728x90
시스템 로그 EDAC(Error Detection And Correction) 로그
EDAC = 오류 감지 및 수정
하드웨어 환경
$ dmidecode -t system
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
Handle 0x0100, DMI type 1, 27 bytes
System Information
Manufacturer: HP
Product Name: ProLiant DL380 G7
Version: Not Specified
Serial Number: SXXXXXXXXA
UUID: 39444835-7926-4753-1346-64344631364E
Wake-up Type: Power Switch
SKU Number: XXXXXX-B21
Family: ProLiant
Handle 0x2000, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
운영체제 환경
$ cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
$ getconf LONG_BIT
64
$ uname -r
3.10.0-1062.18.1.el7.x86_64
시스템 로그(/var/log/messages)
kernel: mce: [Hardware Error]: Machine check events logged
kernel: EDAC MC0: 1 CE error on CPU#0Channel#0_DIMM#0 (channel:0 slot:0 page:0x0 offset:0x0 grain:8 syndrome:0x0)
kernel: EDAC MC0: 1 CE error on CPU#0Channel#0_DIMM#0
- 오류 감지 및 수정(error detection and correction, EDAC)
- 메모리 컨트롤러(memory controller, MC)
- 수정 가능한 오류(correctable errors, CE)
- 듀얼 인라인 메모리 모듈(dual in-line memory module, DIMM)
장애 확인 및 장애 메모리 슬롯 위치
$ grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count
/sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:4
/sys/devices/system/edac/mc/mc1/csrow0/ch0_ce_count:0
메모리 슬롯 위치 확인(dmidecode 명령)
$ dmidecode -t memory | grep -v "Size: No Module Installed" | grep -C 3 -i Size
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: 3
Locator: PROC 1 DIMM 3A
--
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: 12
Locator: PROC 2 DIMM 3A
EDAC 유틸리티(edac-utils) 설치
$ yum install -y libsysfs edac-utils
edac-utils 명령 실행
$ edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: CPU#0Channel#0_DIMM#0: 4 Corrected Errors
mc1: 0 Uncorrected Errors with no DIMM info
mc1: 0 Corrected Errors with no DIMM info
mc1: csrow0: 0 Uncorrected Errors
mc1: csrow0: CPU#1Channel#0_DIMM#0: 0 Corrected Errors
참고 사이트
- https://www.kernel.org/doc/html/v5.0/admin-guide/ras.html
728x90
'리눅스' 카테고리의 다른 글
[리눅스] 웹 인터페이스에서 HAProxy 서버를 관리하는 방법(haproxy-wi) (0) | 2022.01.12 |
---|---|
[리눅스] nginx php-fpm 연동 (0) | 2022.01.12 |
[리눅스] 시스템 로그 EDAC(Error Detection And Correction) 로그 (0) | 2022.01.12 |
[명령어] rdate 명령어 | 시간 동기화 (0) | 2022.01.12 |
[명령어] mkdir 명령어 (0) | 2022.01.11 |
[Linux] How to install libmcrypt on CentOS 7.9 (0) | 2022.01.07 |