목차/10. 모니터링·문제 해결

10모니터링·문제 해결모니터링·문제 해결

클러스터를 운영하다 보면 멤버들이 제대로 동작하는지 들여다보고, 문제가 생기면 원인을 빠르게 좁혀 가는 일이 일상이 됩니다. 이 장은 그 일에 쓰는 모든 도구를 빠짐없이 정리합니다. 명령줄에서 상태를 보는 cphaprob 계열과 그에 대응하는 Gaia Clish show cluster 명령, SmartConsole에서 보는 로그, 외부 모니터링으로 알림을 보내는 SNMP Trap, 수동으로 페일오버를 일으키는 방법, 그리고 가장 골치 아픈 Critical Device routed 문제와 ClusterXL 오류 메시지 해석까지 차례로 다룹니다.

모니터링 명령의 큰 그림

ClusterXL 모니터링 명령은 클러스터와 멤버들이 정상인지 확인하고, Critical Device를 정의 하는 데 씁니다. 여기서 Critical Device(중요 장치) 라는 말이 계속 나오는데, 다른 이름으로는 Problem Notification 또는 줄여서 pnote 라고도 부릅니다. 이것은 각 멤버 위에서 동작하는 특수한 소프트웨어 장치로, 클러스터 운영에 결정적인 요소들을 감시 합니다. 감시 대상 구성요소가 제때 상태를 보고하지 못하거나 상태를 "problem"으로 보고하면, 그 멤버의 상태는 즉시 Down으로 바뀝니다.

명령을 둘러보는 방법은 셸에 따라 다릅니다.

Gaia Clish 에서는 show cluster까지 친 다음 <ESC><ESC>를 눌러 사용 가능한 명령을 전부 펼쳐 봅니다.
Expert mode 에서는 cphaprob 명령만 실행하면 사용 가능한 명령이 나옵니다. cphaprob 명령은 Gaia Clish에서도 그대로 실행 할 수 있습니다.

구문 표기 약속은 다른 장과 동일합니다.

표기	뜻
중괄호 `{ }`	세로줄 `\	` 로 구분된 후보 목록. 이 중 하나만 고릅니다.
꺾쇠 `< >`	변수. 사용자가 지원되는 실제 값을 직접 지정 합니다.
대괄호 `[ ]`	선택적(optional) 명령·파라미터. 넣어도 되고 안 넣어도 됩니다.

모니터링 명령 한눈에 보기

아래 표는 ClusterXL 모니터링 명령을 Gaia Clish 명령과 Expert mode cphaprob 명령을 짝지어 정리한 것입니다. 각 명령의 자세한 설명은 이 장의 뒤쪽 절에서 이어집니다.

하는 일	Gaia Clish	Expert mode
멤버 상태와 이름 보기	`show cluster state`	`cphaprob [-vs <VSID>] state`
Critical Device(pnote)와 상태 보기	`show cluster members pnotes {all \	problem}`	`cphaprob [-l] [-ia] [-e] list`
클러스터 인터페이스 보기	`show cluster members interfaces {all \	secured \	virtual \	vlans}`	`cphaprob [-vs all] [-a] [-m] if`
Bond 구성 보기	`show cluster bond {all \	name <bond_name>}`	`cphaprob show_bond [<bond_name>]`
Bond 그룹 보기	N / A	`cphaprob show_bond_groups`
페일오버 통계 보기(및 리셋)	`show cluster failover [reset {count \	history}]`	`cphaprob [-reset {-c \	-h}] [-l <count>] show_failover`
소프트웨어 버전·일치 여부 보기	`show cluster release`	`cphaprob release`
Delta Sync 통계 보기	`show cluster statistics sync [reset]`	`cphaprob [-reset] syncstat`
Connections 테이블 Delta Sync 통계	`show cluster statistics transport [reset]`	`cphaprob [-reset] ldstat`
CCP 모드 보기	`show cluster members interfaces virtual`	`cphaprob [-vs all] -a if`
IGMP 멤버십 보기	`show cluster members igmp`	`cphaprob igmp`
클러스터 고유 IP 테이블 보기	`show cluster members ips` / `show cluster members monitored`	`cphaprob tablestat` / `cphaprob -m tablestat`
로컬 로그의 멤버 ID 표기 모드 보기	`show cluster members idmode`	`cphaprob names`
RouteD가 감시하는 인터페이스 보기(OSPF)	`show ospf interfaces [detailed]`	`cphaprob routedifcs`
RouteD 데몬 역할 보기	`show cluster roles`	`cphaprob roles`
Cluster Correction 통계 보기	N / A	`cphaprob [{-d \	-f \	-s}] corr`
CCP 모드 보기	`show cluster members interfaces virtual`	`cphaprob -a if`
CCP 암호화 설정 보기	`show cluster members ccpenc`	`cphaprob ccp_encrypt`
Multi-Version Cluster 상태 보기	`show cluster members mvc`	N / A
Full Connectivity Upgrade 통계 보기	N / A	`cphaprob fcustat`

Gaia Clish의 show cluster 명령 트리는 이렇게 생겼습니다.

show cluster
      bond
            all
            name <Name of Bond>
      failover
      members
            ccpenc
            idmode
            igmp
            interfaces
                  all
                  secured
                  virtual
                  vlans
            ips
            monitored
            mvc
            pnotes
                  all
                  problem
      release
      roles
      state
      statistics
            sync [reset]
            transport [reset]

Expert mode의 cphaprob 명령 목록은 다음과 같습니다.

cphaprob [-vs <VSID>] state
cphaprob [-reset {-c | -h}] [-l <count>] show_failover
cphaprob names
cphaprob [-reset] [-a] syncstat
cphaprob [-reset] ldstat
cphaprob [-l] [-i[a]] [-e] list
cphaprob [-vs all] [-a] [-m] if
cphaprob show_bond [<bond_name>]
cphaprob show_bond_groups
cphaprob igmp
cphaprob fcustat
cphaprob [-m] tablestat
cphaprob routedifcs
cphaprob roles
cphaprob release
cphaprob ccp_encrypt
cphaprob [{-d | -f | -s}] corr

클러스터 상태 보기 (cphaprob state)

클러스터를 구성한 뒤 가장 먼저, 그리고 가장 자주 쓰는 명령입니다. 클러스터 전체 상태를 한눈에 보여 줍니다.

Gaia Clish:
1. set virtual-system <VSID>
2. show cluster state

Expert mode:
cphaprob [-vs <VSID>] state

출력 예시는 다음과 같습니다.

Member1> show cluster state
Cluster Mode: High Availability (Active Up) with IGMP Membership

ID    Unique Address    Assigned Load    State        Name
1 (local) 11.22.33.245       100%        ACTIVE(!)    Member1
2         11.22.33.246         0%        DOWN         Member2

Active PNOTEs: COREXL

Last member state change event:
   Event Code:            CLUS-116505
   State change:          INIT -> ACTIVE(!)
   Reason for state change: All other machines are dead (timeout), FULLSYNC PNOTE
   Event time:            Sun Sep  8 15:28:39 2019

Cluster failover count:
   Failover counter:      0
   Time of counter reset: Sun Sep  8 15:28:21 2019 (reboot)
Member1>

출력의 각 필드는 다음과 같은 뜻입니다.

필드	설명
Cluster Mode	클러스터 모드. `Load Sharing (Multicast)`, `Load Sharing (Unicast)`, `High Availability (Primary Up)`, `High Availability (Active Up)`, `Virtual System Load Sharing` 중 하나. 타사 클러스터 제품에서는 `Service`로 표시됩니다(자세한 내용은 Clustering Definitions and Terms 참고).
ID	High Availability 모드에서는 SmartConsole 클러스터 객체에 설정한 멤버 우선순위. Load Sharing 모드에서는 멤버 ID.
Unique Address	보통 Sync 인터페이스의 IP 주소를 보여 줍니다. 경우에 따라 다른 클러스터 인터페이스의 IP가 나오기도 합니다.
Assigned Load	HA 모드에서는 Active 멤버가 100%, 나머지 Standby 멤버는 0%. Load Sharing(Unicast·Multicast) 모드에서는 모든 Active 멤버가 100%.
State	HA 모드에서는 정상 클러스터일 때 멤버 하나만 ACTIVE이고 나머지는 STANDBY. Load Sharing 모드에서는 모든 멤버가 ACTIVE. 타사 클러스터에서는 이 명령이 Full Sync 진행 상태만 보고하므로 모든 멤버가 ACTIVE여야 합니다. (가능한 상태는 아래 표 참고)
Name	SmartConsole에 설정한 멤버 객체 이름.
Active PNOTEs	상태를 "problem"으로 보고하는 Critical Device 목록.
Last member state change event	이 멤버가 마지막으로 클러스터 상태를 바꾼 시점 정보.
Event Code	이벤트 코드. 자세한 내용은 sk125152.
State change	이전 상태 → 새 상태.
Reason for state change	상태를 바꾼 이유.
Event time	상태를 바꾼 날짜·시각.
Last cluster failover event	마지막 페일오버가 일어난 시점 정보.
Transition to new ACTIVE	어떤 멤버가 새 Active가 되었는지.
Reason	마지막 페일오버의 이유.
Event time	마지막 페일오버의 날짜·시각.
Cluster failover count	페일오버 횟수 정보.
Failover counter	부팅 이후의 페일오버 횟수. 이 값은 재부팅해도 유지되며, 멤버 간에 동기화 됩니다.
Time of counter reset	마지막 카운터 리셋의 날짜·시각과 리셋을 일으킨 주체.

멤버가 가질 수 있는 상태

멤버 상태를 살필 때는 이 멤버가 패킷을 전달하는지, 그리고 패킷 전달을 막는 문제가 있는지 두 가지를 함께 봐야 합니다. 각 상태는 Critical Device 검사 결과를 반영합니다.

클러스터 상태	설명	패킷 전달?	문제 상태?
ACTIVE	모든 것이 정상.	예	아니오
ACTIVE(!) 외	문제가 감지되었지만, 이 멤버가 클러스터의 유일한 멤버이거나 다른 Active 멤버가 없어서 여전히 패킷을 전달함. 그 밖의 상황이면 멤버는 Down 상태가 됨.	예	예
ACTIVE(!F)	위와 같음. 멤버가 freeze 상태.	예	예
ACTIVE(!P)	위와 같음. Load Sharing Unicast 모드의 Pivot 멤버.	예	예
ACTIVE(!FP)	위와 같음. Load Sharing Unicast의 Pivot 멤버이면서 freeze 상태.	예	예
DOWN	Critical Device 하나가 상태를 "problem"으로 보고.	아니오	예
LOST	상대 멤버가 로컬 멤버와의 연결을 잃음(예: 상대 멤버 재부팅 중).	아니오	예
READY	멤버가 자신을 클러스터의 일부로 인식하고 동작할 준비는 됐지만, 설계상 무언가가 활성화를 막고 있음. (아래 설명 참고)	아니오	아니오
STANDBY	HA 모드 전용. Active 멤버가 실패하기를 기다리며 대기.	아니오	아니오
BACKUP	멤버 3개 이상인 VSLS 모드 VSX 클러스터 전용. 세 번째 이후 멤버에서 Virtual System의 상태.	아니오	아니오
INIT	부팅 직후부터 Full Sync가 끝날 때까지의 단계.	아니오	아니오

READY 상태가 되는 이유는 두 가지입니다.

필요한 소프트웨어 구성요소가 아직 다 로드·초기화되지 않았거나, 구성 단계가 다 끝나지 않은 경우입니다. 멤버는 Active가 되기 전에 다른 멤버들에게 "내가 Active가 돼도 되는지" 물어봅니다. HA 모드에서는 이미 Active 멤버가 있는지, Load Sharing Unicast 모드에서는 이미 Pivot 멤버가 있는지 확인합니다. 응답을 받아 다음 상태(Active, Standby, Pivot, non-Pivot)를 정할 때까지 READY 상태에 머뭅니다.
이 멤버에 설치된 소프트웨어 버전이 다른 모든 멤버보다 높은 경우입니다. 예를 들어 클러스터를 업그레이드하는 동안 멤버들의 버전이 다르면, 새 버전 멤버는 READY 상태가 되고 이전 버전 멤버는 Active/Active Attention 상태가 됩니다. 이는 Multi-Version Cluster 메커니즘이 꺼져 있을 때만 해당하며, 해결책은 sk42096 을 참고하세요.

Critical Device 보기 (cphaprob list)

내장 Critical Device가 여러 개 있고, 관리자가 추가로 정의할 수도 있습니다. Critical Device 하나라도 상태를 "problem"으로 보고하면 그 멤버는 "DOWN" 으로 보고됩니다.

내장 Critical Device 목록

Critical Device	감시 대상	"OK" 상태의 뜻	"problem" 상태의 뜻
Problem Notification	모든 Critical Device를 감시.	이 멤버의 어떤 Critical Device도 problem이 아님.	적어도 하나가 "problem"을 보고.
Init	"HA 모듈"이 성공적으로 초기화됐는지 감시(sk36372).	이 멤버가 다른 멤버들로부터 클러스터 상태 정보를 받음.	—
Interface Active Check	클러스터 인터페이스 상태 감시.	모든 클러스터 인터페이스가 UP(CCP 패킷 송수신 정상).	적어도 하나의 인터페이스가 down(CCP 패킷을 제때 못 보내거나 못 받음).
Load Balancing Configuration	현재 사용하지 않음(sk36373).	—	—
Recovery Delay	Virtual System 상태 감시(sk92353).	이 멤버에서 VS 상태를 바꿀 수 있음.	아직 VS 상태를 바꿀 수 없음.
CoreXL Configuration	모든 멤버의 CoreXL 구성 불일치 감시.	이 멤버의 CoreXL 방화벽 인스턴스 수가 모든 상대 멤버와 같음.	인스턴스 수가 상대 멤버와 다름. 인스턴스 수가 더 많은 멤버가 DOWN 으로 바뀜.
Fullsync	이 멤버의 Full Sync가 성공했는지 감시.	Full Sync 성공.	Full Sync를 완료하지 못함.
Policy	보안 정책이 설치됐는지 감시.	정책 설치 성공.	현재 정책이 설치돼 있지 않음.
fwd	Security Gateway 프로세스 `fwd` 감시.	`fwd` 데몬이 제때 상태 보고.	`fwd` 데몬이 제때 상태를 보고하지 못함.
cphad	ClusterXL 프로세스 `cphamcset` 감시. `$FWDIR/log/cphamcset.elg` 파일도 참고.	`cphamcset` 데몬이 제때 상태 보고.	`cphamcset` 데몬이 제때 상태를 보고하지 못함.
routed	Gaia 프로세스 `routed` 감시.	`routed` 데몬이 제때 상태 보고.	`routed` 데몬이 제때 상태를 보고하지 못함.
cvpnd	Mobile Access 백엔드 프로세스 `cvpnd` 감시(Mobile Access 블레이드 활성화 시 표시).	`cvpnd` 데몬이 제때 상태 보고.	`cvpnd` 데몬이 제때 상태를 보고하지 못함.
ted	Threat Emulation 프로세스 `ted` 감시.	`ted` 데몬이 제때 상태 보고.	`ted` 데몬이 제때 상태를 보고하지 못함.
VSX	VSX 클러스터의 모든 Virtual System 감시.	VS0에서는 모든 VS가 Down이 아님. 다른 VS에서는 VS0이 살아 있음.	모든 VS의 blocking 상태 최솟값이 "active"가 아님(문제 VSID가 `Problematic VSIDs:` 줄에 표시됨).
Instances	VSX HA 모드(VSLS 아님) 클러스터에 나타남.	받은 CCP 패킷의 CoreXL 인스턴스 수가 이 멤버(또는 VS)에 로드된 수와 일치.	둘이 불일치(sk106912).
Hibernating	멤버 3개 이상인 VSX VSLS 모드 클러스터에 나타남. 이 VS가 "Backup"(동면) 상태인지 표시(sk114557).	—	이 멤버에서 이 VS가 "Backup"(동면) 상태.
admin_down	Critical Device "admin_down" 감시.	—	사용자가 이 멤버에서 `clusterXL_admin down` 명령을 실행함.
host_monitor	Critical Device "host_monitor" 감시. 사용자가 `$FWDIR/bin/clusterXL_monitor_ips` 스크립트를 실행함.	감시 대상 IP 주소들이 모두 ping에 응답.	적어도 하나의 감시 대상 IP가 ping에 한 번이라도 응답하지 않음.
(사용자 공간 프로세스 이름)	`fwd`, `routed`, `cvpnd`, `ted`를 제외한 사용자 공간 프로세스. 관리자가 `$FWDIR/bin/clusterXL_monitor_process` 스크립트를 실행함.	감시 대상 프로세스들이 모두 실행 중.	적어도 하나의 감시 대상 프로세스가 실행되지 않음.
Local Probing	클러스터 인터페이스의 probing 메커니즘 감시(용어집의 Probing 참고).	모든 클러스터 인터페이스에서 CCP 패킷 수신.	적어도 하나의 인터페이스가 5초 동안 CCP 패킷을 받지 못함. 해당 인터페이스가 연결된 네트워크에 대해 probing이 시작됨.

명령 구문과 옵션

Gaia Clish:
show cluster members pnotes {all | problem}

Expert mode:
cphaprob [-l] [-ia] [-e] list

명령	설명
`show cluster members pnotes all`	모든 Critical Device 목록을 보여 줌.
`show cluster members pnotes problem`	상태를 "problem"으로 보고하는 "Built-in Devices"와 "Registered Devices"만 보여 줌.
`cphaprob -l list`	모든 Critical Device 목록을 보여 줌.
`cphaprob -i list`	문제가 없으면 `There are no pnotes in problem state`. 문제가 있으면 problem을 보고하는 Critical Device만 출력.
`cphaprob -ia list`	문제가 없으면 위와 동일 메시지. 문제가 있으면 Critical Device "Problem Notification"과 problem을 보고하는 Critical Device를 함께 출력.
`cphaprob -e list`	문제가 없으면 위와 동일 메시지. 문제가 있으면 problem을 보고하는 Critical Device만 출력.

예시로 읽는 출력

예시 1 — fwd 프로세스가 죽어 problem을 보고하는 경우.

[Expert@Member1:0]# cphaprob -l list
Built-in Devices:
   Device Name: Interface Active Check
   Current state: OK
   Device Name: Recovery Delay
   Current state: OK
   Device Name: CoreXL Configuration
   Current state: OK
Registered Devices:
   Device Name: Fullsync
   Registration number: 0
   Timeout: none
   Current state: OK
   Time since last report: 1753.7 sec
   Device Name: Policy
   Registration number: 1
   Timeout: none
   Current state: OK
   Time since last report: 1753.7 sec
   Device Name: routed
   Registration number: 2
   Timeout: none
   Current state: OK
   Time since last report: 940.3 sec
   Device Name: fwd
   Registration number: 3
   Timeout: 30 sec
   Current state: problem
   Time since last report: 1782.9 sec
   Process Status: DOWN
   Device Name: cphad
   Registration number: 4
   Timeout: 30 sec
   Current state: OK
   Time since last report: 1778.3 sec
   Process Status: UP
   Device Name: VSX
   Registration number: 5
   Timeout: none
   Current state: OK
   Time since last report: 1773.3 sec
   Device Name: Init
   Registration number: 6
   Timeout: none
   Current state: OK
   Time since last report: 1773.3 sec
[Expert@Member1:0]#

예시 2 — CoreXL 방화벽 인스턴스 수가 멤버 간에 달라 CoreXL Configuration이 problem을 보고하는 경우. 이때 상태는 problem (non-blocking)으로 표시됩니다.

[Expert@Member1:0]# cphaprob -l list
Built-in Devices:
   Device Name: Interface Active Check
   Current state: OK
   Device Name: Recovery Delay
   Current state: OK
   Device Name: CoreXL Configuration
   Current state: problem (non-blocking)
Registered Devices:
   ...
   Device Name: fwd
   Current state: OK
   Process Status: UP
   ...
[Expert@Member1:0]#

클러스터 인터페이스 보기 (cphaprob if)

이 명령은 멤버 인터페이스와 가상 클러스터 인터페이스의 상태를 보여 줍니다. ClusterXL은 인터페이스를 Critical Device로 취급 하고, 각 인터페이스가 CCP 패킷을 보내고 받을 수 있는지 확인합니다.

ClusterXL은 또한 필요한 최소 정상 인터페이스 수를, 마지막 재부팅 이후 감지한 최대 정상 인터페이스 수로 설정 합니다. 정상 인터페이스 수가 이 필요 수보다 적으면 ClusterXL은 그 멤버를 실패로 선언하고 페일오버를 시작합니다. 동기화 인터페이스에도 같은 규칙이 적용되며, 이때는 정상 동기화 인터페이스만 셉니다.

인터페이스가 DOWN이라는 것은 CCP 패킷을 받지 못하거나, 보내지 못하거나, 둘 다 라는 뜻입니다. 받기는 하지만 보내지는 못하는 경우도 있습니다. 출력에 보이는 시간은 마지막으로 CCP 패킷을 보내거나 받은 지 몇 초가 지났는지 를 나타냅니다.

Gaia Clish:
1. set virtual-system <VSID>
2. show cluster members interfaces {all | secured | virtual | vlans}

Expert mode:
cphaprob [-vs all] [-a] [-m] if

명령	보여 주는 것
`show cluster members interfaces all`	모든 클러스터 인터페이스 전체 목록 — 필요 인터페이스 수, Network Objective, VLAN 감시 모드(또는 감시 VLAN 목록) 포함.
`show cluster members interfaces secured`	클러스터·Sync 인터페이스와 상태만 — Network Objective, VLAN 감시 모드·감시 VLAN 제외.
`show cluster members interfaces virtual`	가상 클러스터 인터페이스 전체 목록과 상태 — 필요 수와 Network Objective 포함, VLAN 관련 제외.
`show cluster members interfaces vlans`	감시 중인 VLAN 인터페이스만.
`cphaprob if`	클러스터·Sync 인터페이스와 상태만(secured와 동일).
`cphaprob -a if`	클러스터 인터페이스 전체 목록과 상태 — 필요 수·Network Objective 포함, VLAN 관련 제외.
`cphaprob -a -m if`	모든 클러스터 인터페이스 전체 목록과 상태 — 필요 수·Network Objective·VLAN 감시 모드(또는 목록) 모두 포함.

출력 예시는 다음과 같습니다.

[Expert@Member1:0]# cphaprob -a -m if
CCP mode: Manual (Unicast)
Required interfaces: 4
Required secured interfaces: 1
Interface Name:    Status:
eth0               UP
eth1 (S)           UP
eth2 (LM)          UP
bond1 (LS)         UP
S - sync, LM - link monitor, HA/LS - bond type
Virtual cluster interfaces: 3
eth0    192.168.3.247
eth2    44.55.66.247
bond1   77.88.99.247
No VLANs are monitored on the member
[Expert@Member1:0]#

출력 필드의 뜻은 다음과 같습니다.

필드/표시	설명
CCP mode	CCP 모드. 기본은 Unicast.
Required interfaces	감시 대상 클러스터 인터페이스 총수(Sync 포함). Network Management 페이지 구성 기준.
Required secured interfaces	필요한 Sync 인터페이스 총수. Network Management 페이지 구성 기준.
Non-Monitored	멤버가 이 인터페이스 상태를 감시하지 않음. SmartConsole에서 Network Type을 `Private`으로 설정한 경우.
UP	멤버가 이 인터페이스를 감시하며, 현재 상태가 UP(CCP 송수신 가능). Network Type이 `Cluster`, `Sync`, `Cluster + Sync` 중 하나.
DOWN	멤버가 이 인터페이스를 감시하며, 현재 상태가 DOWN(CCP 송신·수신 또는 둘 다 불가). Network Type이 `Cluster`, `Sync`, `Cluster + Sync` 중 하나.
(S)	Sync 인터페이스. Network Type이 `Sync` 또는 `Cluster + Sync`.
(LM)	`$FWDIR/conf/cpha_link_monitoring.conf` 파일에 설정된 인터페이스. 링크만 감시하고 CCP 패킷 송수신은 감시하지 않음.
(HA)	High Availability 모드의 Bond 인터페이스.
(LS)	Load Sharing 모드의 Bond 인터페이스.
Virtual cluster interfaces	설정된 가상 클러스터 인터페이스 총수.
No VLANs are monitored on the member	VLAN 감시 모드 — 클러스터 인터페이스에 VLAN 인터페이스가 없음.
Monitoring mode is Monitor all VLANs	VLAN이 있고 모든 VLAN ID를 감시.
Monitoring mode is Monitor specific VLAN	VLAN이 있고 특정 VLAN ID만 감시.

Bond 인터페이스 보기 (cphaprob show_bond)

이 명령은 Bond 인터페이스와 그 하위(subordinate) 인터페이스의 구성을 보여 줍니다.

Gaia Clish:
1. show cluster bond {all | name <bond_name>}
2. show bonding groups

Expert mode:
cphaprob show_bond [<bond_name>]
cphaprob show_bond_groups

명령	설명
`show cluster bond all` / `show bonding groups` / `cphaprob show_bond`	설정된 모든 Bond 인터페이스 구성.
`show cluster bond name <bond_name>` / `cphaprob show_bond <bond_name>`	지정한 Bond 인터페이스 구성.
`cphaprob show_bond_groups`	설정된 Bond 그룹과 설정.

예시 1 — cphaprob show_bond.

[Expert@Member2:0]# cphaprob show_bond
                    |                   |      |Slaves     |Slaves  |Slaves
Bond name           |Mode               |State |configured |link up |required
-----------+-------------------+------+-----------+--------+--------
bond1               | High Availability | UP   | 2         | 2      | 1
Legend:
UP! - Bond interface state is UP, yet attention is required
Slaves configured - number of slave interfaces configured on the bond
Slaves link up    - number of operational slaves
Slaves required   - minimal number of operational slaves required for bond to be UP
[Expert@Member2:0]#

Member2> show bonding groups
Bonding Interface: 1
Bond Configuration
   xmit-hash-policy Not configured
   down-delay 200
   primary Not configured
   lacp-rate Not configured
   mode active-backup
   up-delay 200
   mii-interval 100
Bond Interfaces
   eth3
   eth4
Member2>

cphaprob show_bond과 show cluster bond all의 출력 필드는 다음과 같습니다.

필드	설명
Bond name	Gaia bonding 그룹 이름.
Mode	bonding 모드. `High Availability` 또는 `Load Sharing`.
State	bonding 그룹 상태. `UP`(완전 동작), `UP!`(UP이지만 주의 필요), `DOWN`(실패).
Slaves configured	이 그룹에 설정된 물리 하위 인터페이스 총수.
Slaves link up	동작 중인 물리 하위 인터페이스 수.
Slaves required	그룹이 UP이 되기 위해 필요한 최소 동작 하위 인터페이스 수.

예시 2 — cphaprob show_bond <bond_name>.

[Expert@Member2:0]# cphaprob show_bond bond1
Bond name: bond1
Bond mode: High Availability
Bond status: UP
Configured slave interfaces: 2
In use slave interfaces: 2
Required slave interfaces: 1
Slave name      | Status          | Link
----------------+-----------------+-------
eth4            | Active          | Yes
eth3            | Backup          | Yes
[Expert@Member2:0]#

cphaprob show_bond <bond_name>과 show cluster bond name <bond_name>의 출력 필드는 다음과 같습니다.

필드	설명
Bond name	bonding 그룹 이름.
Bond mode	`High Availability` 또는 `Load Sharing`.
Bond status	`UP`, `UP!`(주의 필요), `DOWN`.
Configured slave interfaces	설정된 물리 하위 인터페이스 총수.
In use slave interfaces	동작 중인 물리 하위 인터페이스 수.
Required slave interfaces	UP이 되기 위해 필요한 최소 동작 하위 인터페이스 수.
Slave name	그룹에 설정된 물리 하위 인터페이스 이름들.
Status	하위 인터페이스 상태. `Active`(HA·LS 모드에서 현재 트래픽 처리), `Backup`(HA 모드 전용, 준비 상태로 내부 페일오버 지원 가능), `Not Available`(물리 링크 끊김 또는 멤버가 Down, 이 상태에서는 내부 페일오버 불가).
Link	물리 링크 상태. `Yes`(링크 있음), `No`(링크 끊김).

예시 3 — cphaprob show_bond_groups.

[Expert@Member2:0]# cphaprob show_bond_groups
                    |           | Required     | Bonds    | Bonds
Group of bonds name | State     | active bonds | in group | status
--------------------+-----------+--------------+----------+--------+
GoB0                | UP        | 1            | bond1    | UP
                    |           |              | bond2    | UP
Legend:
Bonds in group        - a list of the bonds in the bond group
Required active bonds  - number of required active bonds
[Expert@Member2:0]#

cphaprob show_bond_groups의 출력 필드는 다음과 같습니다.

필드	설명
Group of bonds name	Bond 그룹 이름.
State	그룹 상태. `UP`(완전 동작) 또는 `DOWN`(실패).
Required active bonds	이 그룹에 필요한 활성 Bond 수.
Bonds in group	이 그룹에 설정된 Gaia Bond 인터페이스 이름들.
Bonds status	각 Bond 인터페이스 상태. `UP` 또는 `DOWN`.

페일오버 통계 보기 (cphaprob show_failover)

이 명령은 멤버에서 페일오버 통계를 보여 줍니다 — 일어난 페일오버 횟수, 이유, 마지막 페일오버 시각.

통계 보기
Gaia Clish:  show cluster failover
Expert mode: cphaprob [-l <number>] show_failover

통계 리셋
Gaia Clish:  show cluster failover reset {count | history}
Expert mode: cphaprob -reset {-c | -h} show_failover

파라미터	설명
`-l <number>`	마지막 페일오버 이벤트를 몇 개나 보여 줄지 지정(1~50).
`count` / `-c`	페일오버 이벤트 카운터를 리셋.
`history` / `-h`	페일오버 이벤트 이력을 리셋.

출력 예시는 다음과 같습니다.

[Expert@Member1:0]# cphaprob show_failover
Last cluster failover event:
   Transition to new ACTIVE: Member 2 -> Member 1
   Reason: ADMIN_DOWN PNOTE
   Event time: Sun Sep  8 18:21:44 2019
Cluster failover count:
   Failover counter: 1
   Time of counter reset: Sun Sep  8 16:08:34 2019 (reboot)
Cluster failover history (last 20 failovers since reboot/reset on Sun Sep 8 16:08:34 2019):
No. Time:                     Transition:           CPU: Reason:
-------------------------------------------------------------------
1   Sun Sep  8 18:21:44 2019  Member 2 -> Member 1  01   ADMIN_DOWN PNOTE
[Expert@Member1:0]#

소프트웨어 버전 보기 (cphaprob release)

이 명령은 로컬 멤버의 소프트웨어 버전(비공개 핫픽스 포함) 과, 다른 멤버들과의 일치·불일치 여부를 보여 줍니다.

Gaia Clish:  show cluster release
Expert mode: cphaprob release

[Expert@Member1:0]# cphaprob release
Release: R80.40 T136
Kernel build: 994000117
FW1 build: 994000116
FW1 private fixes: None
ID         SW release
1 (local)  R80.40 T136
2          R80.40 T136
[Expert@Member1:0]#

Delta Sync 통계 보기 (cphaprob syncstat)

부하가 높은 클러스터와 지리적으로 떨어진 멤버들 은 특별한 어려움을 안깁니다. 높은 연결 생성률과 먼 거리는 클러스터 동작에 영향을 주는 지연을 일으킬 수 있으므로, 그런 환경에서는 State Synchronization 메커니즘을 모니터링해야 합니다. 문제 해결은 다음 순서로 합니다.

Delta Sync 통계 카운터를 살펴봅니다.

Gaia Clish:  show cluster statistics sync
Expert mode: cphaprob syncstat

해당하는 동기화 전역 구성 파라미터의 값을 바꿉니다.
Delta Sync 통계 카운터를 리셋합니다.

Gaia Clish:  show cluster statistics sync reset
Expert mode: cphaprob -reset syncstat

다시 통계를 살펴 문제가 해결됐는지 확인합니다.
확인된 문제를 해결합니다.

출력 예시는 다음과 같습니다.

Delta Sync Statistics
Sync status: OK
Drops:
   Lost updates................................. 0
   Lost bulk update events...................... 0
   Oversized updates not sent................... 0
Sync at risk:
   Sent reject notifications.................... 0
   Received reject notifications................ 0
Sent messages:
   Total generated sync messages................ 26079
   Sent retransmission requests................. 0
   Sent retransmission updates.................. 0
   Peak fragments per update.................... 1
Received messages:
   Total received updates....................... 3710
   Received retransmission requests............. 0
Sync Interface:
   Name......................................... eth1
   Link speed................................... 1000Mb/s
   Rate......................................... 46000 [Bps]
   Peak rate.................................... 46000 [Bps]
   Link usage................................... 0%
   Total........................................ 376827[KB]
Queue sizes (num of updates):
   Sending queue size........................... 512
   Receiving queue size......................... 256
   Fragments queue size......................... 50
Timers:
   Delta Sync interval (ms)..................... 100
Reset on Sun Sep  8 16:09:15 2019 (triggered by fullsync).

각 섹션의 의미는 다음과 같습니다.

Sync status 섹션

Delta Sync 메커니즘의 상태입니다. 정상이면 Sync status: OK이고, 그 밖에 다음과 같은 다양한 상태가 있습니다.

n Sync status: Off - Full-sync failure
n Sync status: Off - Policy installation failure
n Sync status: Off - Cluster module not started
n Sync status: Off - SIC failure
n Sync status: Off - Full-sync checksum error
n Sync status: Off - Full-sync received queue is full
n Sync status: Off - Release version mismatch
n Sync status: Off - Connection to remote member timed-out
n Sync status: Off - Connection terminated by remote member
n Sync status: Off - Could not start a connection to remote member
n Sync status: Off - cpstart
n Sync status: Off - cpstop
n Sync status: Off - Manually disabled sync
n Sync status: Off - Was not able to start for more than X second
n Sync status: Off - Boot
n Sync status: Off - Connectivity Upgrade (CU)
n Sync status: Off - cphastop
n Sync status: Off - Policy unloaded
n Sync status: Off - Hibernation
n Sync status: Off - OSU deactivated
n Sync status: Off - Sync interface down
n Sync status: Fullsync in progress
n Sync status: Problem (Able to send sync packets, unable to receive sync packets)
n Sync status: Problem (Able to send sync packets, saving incoming sync packets)
n Sync status: Problem (Able to send sync packets, able to receive sync packets)
n Sync status: Problem (Unable to send sync packets, unable to receive sync packets)
n Sync status: Problem (Unable to send sync packets, saving incoming sync packets)
n Sync status: Problem (Unable to send sync packets, able to receive sync packets)

Drops 섹션

Delta Sync 네트워크에서의 드롭 통계입니다.

필드	설명
Lost updates	이 멤버가 (CCP 패킷의 시퀀스 번호 기준으로) 잃어버렸다고 보는 Delta Sync 업데이트 수. 0보다 크면 업데이트를 잃은 것. 대응: Sending Queue·Receiving Queue 크기를 늘림. `Received reject notification`이 증가하면 Sending Queue를, 증가하지 않으면 Receiving Queue를 늘림.
Lost bulk update events	이 멤버가 Delta Sync 업데이트를 놓친 횟수(bulk update = 로컬 receiving queue 크기의 2배). 예상보다 훨씬 큰 시퀀스 번호의 업데이트를 받을 때 증가하며, 보통 대량 패킷 드롭을 일으키는 네트워크 문제를 가리킴. 대응: 값이 일정하면 수동 Full Sync로 해결 가능(sk37029). 계속 증가하면 네트워크 문제이므로 Receiving·Sending Queue 둘 다 늘림.
Oversized updates not sent	보내기 전에 버려진 너무 큰 Delta Sync 업데이트 수. 업데이트가 로컬 Fragments Queue 크기보다 클 때 증가. 대응: 값이 일정하면 Sending Queue를 늘리고, 계속 증가하면 Check Point Support에 문의.

Sync at risk 섹션

Sending Queue가 꽉 차서 Delta Sync 재전송 요청을 거부하는 상황의 통계입니다.

필드	설명
Sent reject notifications	이 멤버가 상대 멤버의 재전송 요청을 거부한 횟수(요청한 업데이트를 더 이상 갖고 있지 않아서).
Received reject notification	이 멤버가 상대 멤버로부터 받은 거부 알림 수.

Sent updates 섹션

이 멤버가 상대 멤버로 보낸 Delta Sync 업데이트 통계입니다.

필드	설명
Total generated sync messages	생성된 Delta Sync 업데이트 수(업데이트, 재전송 요청, 재전송 확인 등 포함).
Sent retransmission requests	이 멤버가 상대에게 특정 업데이트 재전송을 요청한 횟수. 참고: 다른 멤버의 `Total generated sync messages`와 비교. 비정상적으로 높으면(상대 멤버 값의 30% 초과) 전체 출력과 토폴로지·구성 설명을 갖춰 Check Point Support에 문의.
Sent retransmission updates	상대 요청에 따라 특정 업데이트를 재전송한 횟수.
Peak fragments per update	이 멤버의 Fragments Queue에 있던 최대 fragment 수(보통 1이어야 함).

Received updates 섹션

이 멤버가 상대 멤버로부터 받은 Delta Sync 업데이트 통계입니다.

필드	설명
Total received updates	받은 Delta Sync 업데이트 총수(재전송 요청·확인 등은 제외, 업데이트만).
Received retransmission requests	받은 재전송 요청 수. 비정상적으로 높으면(이 멤버의 `Total generated sync messages`의 30% 초과) 연결 문제를 의심하고 Check Point Support에 문의.

Queue sizes 섹션

Delta Sync 큐의 크기입니다.

필드	설명
Sending queue size	상대로부터 확인을 받을 때까지 이미 보낸 업데이트를 버퍼링하는 순환 큐 크기. 재전송에 필요하며 멤버마다 하나. 기본값 512(최솟값이기도 함).
Receiving queue size	받은 업데이트를 버퍼링하는 순환 큐 크기. 업데이트가 빠졌을 때 재전송될 때까지 나머지를 순서대로 보관하거나, 단편화된 업데이트를 재조립하는 데 사용. 멤버마다 하나. 기본값 256(최솟값).
Fragments queue size	업데이트를 Sending Queue로 옮기기 전에 준비하는 큐 크기. Sending Queue보다 작아야 하고, Receiving Queue보다는 훨씬 작아야 함. 기본값 50(최솟값).

Timers 섹션

필드	설명
Delta Sync interval (ms)	이 멤버가 Sending Queue에서 Delta Sync 업데이트를 보내는 간격. 기본 시간 단위는 100ms(1 tick). 기본값 100ms(최솟값). Increasing the Sync Timer 참고.

마지막의 Reset on XXX (triggered XXX) 는 마지막 통계 리셋의 날짜·시각과, 그 리셋이 어떻게 일어났는지("manually" 또는 "by fullsync")를 보여 줍니다.

IGMP 상태 보기 (cphaprob igmp)

IGMP 멤버십 상태를 보여 줍니다.

Gaia Clish:  show cluster members igmp
Expert mode: cphaprob igmp

[Expert@Member1:0]# cphaprob igmp
IGMP Membership: Enabled
Supported Version: 2
Report Interval [sec]: 60
IGMP queries are replied only by Operating System
Interface  Host Group       Multicast Address    Last ver.  Last Query[sec]
------------------------------------------------------------------------------
eth0       224.168.3.247    01:00:5e:28:03:f7    N/A        N/A
eth1       224.22.33.250    01:00:5e:16:21:fa    N/A        N/A
eth2       224.55.66.247    01:00:5e:37:42:f7    N/A        N/A
[Expert@Member1:0]#

Connections 테이블 Delta Sync 통계 (cphaprob ldstat)

이 명령은 Connections 커널 테이블(id 8158) 에서 일어난 작업에 대한 Delta Sync 통계를 보여 줍니다. 새 연결 생성(SET), 연결 갱신(REFRESH), 연결 삭제(DELETE) 등의 작업이 나옵니다.

Gaia Clish:  show cluster statistics transport [reset]
Expert mode: cphaprob [-reset] ldstat

reset 플래그는 마지막 재부팅·리셋 이후 모인 커널 통계를 리셋합니다.

[Expert@Member1:0]# cphaprob ldstat
Operand               Calls   Bytes   Average   Ratio %
----------------------------------------------------------
ERROR                 0       0       0         0
SET                   354     51404   145       33
RENAME                0       0       0         0
REFRESH               1359    70668   52        46
DELETE                290     10440   36        6
SLINK                 193     12352   64        8
UNLINK                0       0       0         0
MODIFYFIELDS          91      7280    80        4
RECORD DATA CONN      0       0       0         0
COMPLETE DATA CONN    0       0       0         0
Total bytes sent: 161292 (0 MB) in 1797 packets. Average 89
[Expert@Member1:0]#

클러스터 IP 주소 보기 (cphaprob tablestat)

멤버들의 IP 주소와 인터페이스를 보여 줍니다.

모든 인터페이스
Gaia Clish:  show cluster members ips
Expert mode: cphaprob tablestat

감시 중인 인터페이스만
Gaia Clish:  show cluster members monitored
Expert mode: cphaprob -m tablestat

[Expert@Member1:0]# cphaprob tablestat
----
Unique IP's Table
----
Member     Interface   IP-Address
------------------------------------------
(Local)    0  1        192.168.3.245
0  2                   11.22.33.245
0  3                   44.55.66.245
1  1                   192.168.3.246
1  2                   11.22.33.246
1  3                   44.55.66.246
------------------------------------------
[Expert@Member1:0]#
[Expert@Member1:0]# fw ctl iflist
1 : eth0
2 : eth1
3 : eth2
[Expert@Member1:0]#

로컬 로그의 멤버 ID 표기 모드 (cphaprob names)

로컬 ClusterXL 로그가 멤버를 Member ID(기본) 로 표기하는지 Member Name 으로 표기하는지 보여 줍니다(설정은 Configuring the Cluster Member ID Mode in Local Logs 참고).

Gaia Clish:  show cluster members idmode
Expert mode: cphaprob names

[Expert@Member1:0]# cphaprob names
Current member print mode in local logs is set to: ID
[Expert@Member1:0]#

RouteD가 감시하는 인터페이스 (cphaprob routedifcs)

OSPF를 구성했을 때 RouteD 데몬이 멤버에서 감시하는 인터페이스를 보여 줍니다. 핵심 아이디어는, OSPF를 구성하면 멤버가 이 인터페이스들을 감시하고 RouteD 데몬이 "올려도 좋다"고 할 때까지 멤버를 Active로 올리지 않는다 는 것입니다. 주로 ClusterXL High Availability Primary Up 구성에서 성급한 failback을 막는 데 씁니다.

Gaia Clish:  show ospf interfaces [detailed]
Expert mode: cphaprob routedifcs

[Expert@Member1:0]# cphaprob routedifcs
No interfaces are registered.
[Expert@Member1:0]#

[Expert@Member1:0]# cphaprob routedifcs
Monitored interfaces registered by routed:
eth0
[Expert@Member1:0]#

RouteD 데몬의 역할 보기 (cphaprob roles)

어느 멤버에서 RouteD 데몬이 Master 로 도는지 보여 줍니다.

Gaia Clish:  show cluster role
Expert mode: cphaprob roles

[Expert@Member1:0]# cphaprob roles
ID         Role
1 (local)  Master
2          Non-Master
[Expert@Member1:0]#

Cluster Correction 통계 보기 (cphaprob corr)

이 명령은 각 멤버에서 Cluster Correction 통계 를 보여 줍니다. Cluster Correction Layer(CCL) 는 비대칭(asymmetric) 연결을 다루는 메커니즘으로, 패킷을 올바른 멤버로 "교정(correct)"해 연결 stickiness를 제공합니다.

대부분의 경우 CCL은 CoreXL SND에서 교정합니다.
동적 라우팅이나 VPN 같은 일부 경우에는 Firewall 또는 SecureXL에서 교정합니다.

ClusterXL이 교정한 패킷과 함께 데이터를 보내야 하는 경우(현재는 VPN뿐)에는 출력에 with metadata로 표시됩니다.

Gaia Clish:  N / A
Expert mode: cphaprob [{-d | -f | -s}] corr

명령	설명
`cphaprob corr`	모든 트래픽의 Cluster Correction 통계.
`cphaprob -d corr`	CoreXL SND만.
`cphaprob -f corr`	CoreXL 방화벽 인스턴스만.
`cphaprob -s corr`	SecureXL만.

예시 1 — 모든 트래픽.

[Expert@Member1:0]# cphaprob corr
Getting stats for SXL device 0, may take a few seconds...
Cluster Correction Stats (All Traffic):
------------------------------------------------------
Sent packets:     156 (0 with metadata)
Sent bytes:       34,568
Received packets: 0 (0 with metadata)
Received bytes:   0
Send errors:      0
Receive errors:   0
Local asymmetric conns: 0
[Expert@Member1:0]#

예시 2 — CoreXL SND만.

[Expert@Member1:0]# cphaprob -d corr
Cluster Correction Stats (Dispatcher Corrections only):
------------------------------------------------------
Sent packets:     0 (0 with metadata)
Sent bytes:       0
Received packets: 0 (0 with metadata)
Received bytes:   0
Send errors:      0
Receive errors:   0
[Expert@Member1:0]#

예시 3 — CoreXL 방화벽 인스턴스만.

[Expert@Member1:0]# cphaprob -f corr
Cluster Correction Stats (Firewall instances only):
------------------------------------------------------
Sent packets:     156 (0 with metadata)
Sent bytes:       34,568
Received packets: 0 (0 with metadata)
Received bytes:   0
Send errors:      0
Receive errors:   0
Local asymmetric conns: 0
[Expert@Member1:0]#

예시 4 — SecureXL만.

[Expert@Member1:0]# cphaprob -s corr
Getting stats for SXL device 0, may take a few seconds...
Cluster Correction Stats (SXL Devices only):
------------------------------------------------------
Sent packets:     0 (0 with metadata)
Sent bytes:       0
Received packets: 0 (0 with metadata)
Received bytes:   0
Send errors:      0
Receive errors:   0
Local asymmetric conns: 0
[Expert@Member1:0]#

CCP 설정 보기 (cphaprob ccp_encrypt)

멤버에서 CCP 모드 와 CCP 암호화 설정(켜짐·꺼짐, 그리고 암호화 키)을 볼 수 있습니다(설정은 Configuring the Cluster Control Protocol (CCP) Settings 참고).

CCP 모드 보기
Gaia Clish:  show cluster members interfaces virtual
Expert mode: cphaprob -a if

CCP 암호화 보기
Gaia Clish:  show cluster members ccpenc
Expert mode: cphaprob ccp_encrypt
             cphaprob ccp_encrypt_key

Multi-Version Cluster 상태 보기 (show cluster members mvc)

Multi-Version Cluster(MVC) 메커니즘 이 켜졌는지(ON) 꺼졌는지(OFF) 보여 줍니다(설정은 Configuring the Multi-Version Cluster Mechanism 참고).

Gaia Clish:  show cluster members mvc
Expert mode: N / A

Member1> show cluster members mvc
ON
Member1>

Full Connectivity Upgrade 통계 보기 (cphaprob fcustat)

마이너 버전 간 업그레이드 시 Full Connectivity Upgrade 통계를 보여 줍니다.

Gaia Clish:  N / A
Expert mode: cphaprob fcustat

[Expert@Member1:0]# cphaprob fcustat
During FCU....................... no
Connection module map............ none
Table id map (remote->local)..... none
Table handlers ..................
   8151 --> 0x0x7f97c421d860 (sip_state)
   8158 --> 0x0x7f97c43d8e30 (connections)
LD handlers......................
   ok - 0
   failed - 0
Global handlers ................. none
[Expert@Member1:0]#

SmartConsole에서 클러스터 상태 모니터링

명령줄만이 아니라 SmartConsole의 로그로도 클러스터를 들여다볼 수 있습니다.

해당 로그를 보려면 왼쪽 탐색 패널에서 Logs & Events > Logs 를 클릭합니다.

멤버 상태 변화에 대한 로그를 받으려면 다음을 합니다.

왼쪽 탐색 패널에서 Gateways & Servers 를 클릭합니다.
클러스터 객체를 엽니다.
왼쪽 트리에서 ClusterXL and VRRP 를 클릭합니다.
클러스터 객체를 엽니다.
Tracking 필드에서 Log를 선택합니다.
OK 를 클릭합니다.
클러스터 객체에 Access Control Policy를 설치합니다.

ClusterXL 로그 메시지 읽는 규칙

이 절의 로그 메시지에는 다음 표기 규칙이 쓰입니다.

대괄호 [ ] 는 자리표시자(place holder)로, 실제 로그에서는 관련 데이터로 치환됩니다(예: [NUMBER]는 숫자로).
꺾쇠 < > 는 대안을 나타내며, 실제 로그에서는 그중 하나가 쓰입니다. 세로줄로 구분됩니다(예: <up|down>은 "up" 또는 "down").
자주 쓰이는 자리표시자는 다음과 같습니다.
- ID: "1"부터 시작하는 고유한 멤버 식별자. 클러스터 객체의 Cluster Members 페이지 정렬 순서에 대응.
- IP: 멤버에 속한 고유 IP 주소.
- MODE: 클러스터 모드(예: New HA, LS Multicast 등).
- STATE: 멤버 상태(예: active, down, standby).
- DEVICE: Critical Device 이름(예: Interface Active Check, fwd).

일반 로그 (General Logs)

로그	설명
`Starting <ClusterXL\	State Synchronization>.`	ClusterXL(또는 타사 클러스터의 State Synchronization)이 성공적으로 시작됨. 보통 멤버 부팅 후나 `cphastart` 호출 후에 나옴.
`Stopping <ClusterXL\	State Synchronization>.`	ClusterXL(또는 State Synchronization)이 비활성화됨. ClusterXL을 재시작하기 전까지 이 멤버는 클러스터의 일부가 아님.
`Unconfigured cluster Computers changed their MAC Addresses. Please reboot the cluster so that the changes take affect.`	보통 멤버 종료 시나 `cphastop` 호출 후에 나옴.

상태 로그 (State Logs)

로그	설명
`Mode inconsistency detected: member [ID] ([IP]) will change its mode to [MODE]. Please re-install the security policy on the cluster.`	드물게 발생. 다른 멤버가 로컬 멤버가 아는 것과 다른 클러스터 모드를 보고함. 보통 모든 멤버에 Access Control Policy 설치를 실패한 결과. 정책을 다시 설치해 해결.

상태와 관련된 추가 로그는 다음과 같습니다.

로그	설명
`State change of member [ID] ([IP]) from [STATE] to [STATE] was cancelled, since all other members are down. Member remains [STATE].`	멤버가 상태를 바꿔야 할 때(예: Active 멤버가 문제를 만나 Down으로 바뀌어야 할 때) 먼저 다른 멤버 상태를 물어봄. 다른 멤버가 모두 down이면 이 멤버는 비-active 상태로 바꿀 수 없으므로(그러면 클러스터가 멈춤), 문제가 있어도 계속 동작하며 보통 "Active(!)"로 보고함.
`member [ID] ([IP]) <is active\	is down\	is stand-by\	is initializing> ([REASON]).`	멤버가 상태를 바꿀 때마다 나옴. 로그 텍스트가 새 상태를 명시함.

Critical Device 로그

로그	설명
`[DEVICE] on member [ID] ([IP]) status OK ([REASON])`	Critical Device가 정상 동작 중.
`[DEVICE] on member [ID] ([IP]) detected a problem ([REASON]).`	Critical Device가 오류를 감지했거나, (timeout 옵션으로 정한) 몇 초 동안 상태를 보고하지 않음.
`[DEVICE] on member [ID] ([IP]) is initializing ([REASON]).`	Critical Device가 메커니즘에 등록은 했지만 아직 상태를 정하지 못함.
`[DEVICE] on member [ID] ([IP]) is in an unknown state ([STATE ID]) ([REASON]).`	보통은 나오지 않아야 함. Check Point Support에 문의.

인터페이스 로그 (Interface Logs)

로그	설명
`interface [INTERFACE NAME] of member [ID] ([IP]) is up.`	인터페이스가 정상 — 예상 서브넷에서 CCP 패킷을 송수신할 수 있음.
`interface [INTERFACE NAME] of member [ID] ([IP]) is down (receive <up\	down>, transmit <up\	down>).`	인터페이스가 CCP 패킷 수신 또는 송신에서 문제를 만남. OS 관점에서는 정상일 수 있지만 다른 멤버와 통신하지 못함.
`interface [INTERFACE NAME] of member [ID] ([IP]) was added.`	새 인터페이스가 멤버에 등록됨(이 인터페이스로 CCP 패킷이 도착). 보통 인터페이스 활성화(예: `ifconfig up`)의 결과. ClusterXL 보고에 포함됨. ClusterXL용으로 그렇게 설정됐다면 여전히 "Disconnected"로 보고될 수 있음.
`interface [INTERFACE NAME] of member [ID] ([IP]) was removed.`	인터페이스가 멤버에서 분리되어 더 이상 ClusterXL이 감시하지 않음.

이유 문자열 (Reason Strings)

다음 텍스트는 Critical Device 로그 메시지에 문제 보고의 이유로 포함될 수 있습니다.

로그	설명
`member [ID] ([IP]) reports more interfaces up.`	다른 멤버가 로컬 멤버보다 더 많은 인터페이스를 정상으로 보고함. 보통 로컬 멤버에 장애 인터페이스가 있고 상대 멤버가 더 나은 멤버 역할을 할 수 있다 는 뜻. 로컬 멤버는 상태를 "Down"으로 바꾸고 상대 멤버가 트래픽을 처리.
`member [ID] ([IP]) has more interfaces - check your disconnected interfaces configuration in the <discntd.if file\	registry>`	같은 클러스터의 멤버들이 인터페이스 수가 다를 때 나옴. 인터페이스가 적은(보고한) 멤버는 클러스터 IP나 동기화 네트워크에 필요한 인터페이스가 빠져 제대로 동작하지 못할 수 있음. 다른 멤버의 인터페이스 중 중복(감시 불필요)인 것이 있으면 명시적으로 "Non-Monitored"로 지정해야 함(Defining Non-Monitored Interfaces 참고).
`[NUMBER] interfaces required, only [NUMBER] up.`	감시 대상 인터페이스 하나 이상에서 문제 감지. 이것이 곧 멤버를 "Down"으로 바꾼다는 뜻은 아님(다른 멤버의 정상 인터페이스가 더 적을 수 있음). 이 경우 정상 인터페이스가 가장 많은 멤버가 살아남고 나머지가 down 됨.

SNMP Trap 다루기

ClusterXL High Availability에 대한 SNMP trap을 구성하고 볼 수 있습니다. SNMP trap을 구성하는 절차는 다음과 같습니다.

Management Server의 명령줄에 접속합니다.
Expert mode로 로그인합니다.
Multi-Domain Server에서는 해당 Domain Management Server 컨텍스트로 이동합니다.

mdsenv <IP Address or Name of Domain Management Server>

다음을 실행합니다.

threshold_config

Threshold Engine Configuration Options 메뉴에서 (9) Configure Thresholds 를 선택합니다.
Threshold Categories 메뉴에서 (2) High Availability 를 선택합니다.
해당 trap들을 선택합니다.
지정한 trap에 대해 다음 동작들을 선택·구성합니다.
- Enable/Disable Threshold
- Set Severity
- Set Repetitions
- Configure Alert Destinations
Threshold Engine Configuration Options 메뉴에서 (7) Configure alert destinations 를 선택합니다.
알림 대상을 구성합니다.
Threshold Engine Configuration Options 메뉴에서 (3) Save policy 를 선택합니다. 정책을 파일로 저장할 수도 있습니다.
SmartConsole에서 이 클러스터 객체에 Access Control Policy를 설치합니다.

수동으로 클러스터 페일오버 일으키기

테스트나 유지보수를 위해 페일오버를 직접 일으킬 수 있습니다(자세한 내용은 sk55081).

방법 1 (권장)

Expert Mode 명령	Gaia Clish 명령	멤버 상태 변화	참고
`clusterXL_admin down`	`set cluster member admin down`	멤버 상태를 DOWN으로	Delta Sync를 끄지 않음
`clusterXL_admin up`	`set cluster member admin up`	멤버 상태를 UP으로	Full Sync를 시작하지 않음

방법 2 (권장하지 않음)

멤버를 DOWN으로 바꾸려면 Expert mode에서 다음을 실행합니다.

1. cphaconf set_pnote -d <Name of Critical Device> -t 0 -s ok register
2. cphaconf set_pnote -d <Name of Critical Device> -s problem report

다시 UP으로 되돌리려면 다음을 실행합니다.

1. cphaconf set_pnote -d <Name of Critical Device> -s ok report
2. cphaconf set_pnote -d <Name of Critical Device> unregister

DOWN으로 바꾸는 동작은 Delta Sync를 끄지 않고, UP으로 되돌리는 동작은 Full Sync를 시작하지 않습니다. 관련 작업은 Registering a Critical Device, Reporting the State of a Critical Device, Unregistering a Critical Device 를 참고하세요.

Critical Device "routed" 문제 해결

Critical Device routed는 멤버의 동적 라우팅(Dynamic Routing) 상태를 감시 합니다. 이 장치는 멤버가 트래픽을 처리할 준비가 되기 전에는 트래픽이 배정되지 않도록 보장합니다. Gaia OS의 RouteD 데몬이 모든 라우팅(정적·동적) 작업을 처리합니다.

동적 라우팅 문제는 다음 증상 중 하나 이상으로 나타납니다.

클러스터 IP 주소 연결 문제
예기치 않은 클러스터 페일오버
Critical Device routed의 상태가 problem

예시는 다음과 같습니다.

Device Name: routed
Registration number: 2
Timeout: none
Current state: problem
Time since last report: 10 sec

흔한 원인은 다음과 같습니다.

클러스터 잘못된 구성
멤버 간 TCP 2010 포트 트래픽이 차단 됨
RouteD 데몬이 모든 경로를 받지 못함
RouteD 데몬이 올바르게 시작되지 않음

routed Critical Device의 정상 동작

routed는 보통 다음 상황에서 일시적으로 상태를 problem으로 보고합니다.

멤버가 페일오버할 때
멤버가 재부팅할 때
멤버 간 동적 라우팅 구성에 불일치가 있을 때

그리고 다음 상황에서 상태를 OK로 보고합니다.

ClusterXL 멤버가 RouteD 데몬에게 자신이 Master라고 알릴 때
RouteD 데몬이 Master로부터 전체 라우팅 상태를 받을 때

기본 문제 해결 단계

클러스터 인터페이스가 올바르게 구성됐는지 살핍니다(Viewing Cluster Interfaces 참고).
ClusterXL HA 모드에서는 RouteD 데몬이 Active 멤버에서 돌고 있는지 확인합니다.
멤버 간 TCP 2010 포트 트래픽이 차단되지 않았는지 확인합니다.
RouteD 클러스터 메시지를 생성합니다. 멤버의 Expert mode에서 다음을 실행합니다.

dbset routed:instance:default:traceoptions:traceoptions:Cluster

그런 다음 /var/log/routed/log 파일을 살펴봅니다.

OSPF가 구성된 ClusterXL HA 모드에서는 Standby 멤버에서 OSPF 인터페이스가 up 인지 확인합니다.
OSPF 구성에서 router-id 불일치 가 있는지 찾아봅니다.

ClusterXL 오류 메시지

멤버에 영향을 주는 모든 중요한 클러스터 이벤트에는 /var/log/messages 파일과 dmesg에 나타나는 고유한 코드 가 있습니다. 각 이벤트 메시지는 CLUS-XXXXXX-Y 접두사로 시작합니다.

메시지의 각 부분은 다음과 같은 뜻입니다.

부분	설명
CLUS-	항상 고정된 문자열.
XXXXXX	6자리 오류 코드. 이벤트를 나타냄 — Critical Device 관련, 멤버 상태 관련, 클러스터 동기화 관련, 정책 설치 관련. 첫 번째 자리는 이벤트를 생성한 멤버(`1` = 로컬 멤버, `2` = 원격 멤버). 두 번째 자리는 클러스터 유형 이벤트 를 나타내며, 나머지 자리의 의미는 클러스터 이벤트 유형에 따라 다름.
Y	이 로그를 생성한 로컬 멤버의 ID 또는 NAME(Configuring the Cluster Member ID Mode in Local Logs 참고).