Importance Of Checklist

How Much Important For the Check List.
1)To Resolve the troubleshooting related issues- easily trace where is the problem is occur.
2)To Avoid starting unrelated kind of issue.
3)PROFESSIONAL WAY - Rather than starting everything practical , If we have well prepared document then proceeding further it is the right way to handle the problem.

Sample Check List
Template of diagnostics steps that can be used (checklists)
For quick reference, this section includes a few example checklists for basic verification,checking, and diagnostics. Before you diagnose your problem, we suggest that you go through these checklists.
Management nodes
Use the following checklist for management nodes:
  1. There are at least two management nodes for redundancy (optional but suggested).
  2. There is enough available disk space for service nodes, compute nodes, and utility nodes.
  3. The clock and time zone settings are correct, and there is a utility to maintain them. A management node can act as a Network Time Protocol (NTP) server.
  4. The hardware connections are functional in an xCAT environment. Use the lshwconn -l
    command to verify this.
  5. There are redundant Ethernet connections.

Service nodes (if any)
    Use the following checklist for service nodes:
  1. There are at least two service nodes for redundancy.
  2. The service nodes can be accessed from the management node using Ethernet.
  3. There is enough available disk space for compute nodes.
  4. The clock and time zone settings are correct, and an NTP client is synchronized to an NTP server.
Ethernet switch
Use the following checklist for an Ethernet switch:
  1. All of the Ethernet switch LEDs of ports with a cable on them are flashing.
  2. All switches have remote execution configured from xCAT (you can run xdsh to them).
  3. All switches have the correct clock and time zone setting, and an NTP client is synchronized to an NTP server.
InfiniBand switch (if any)
Use the following checklist for an InfiniBand switch:
  1. All of the LEDs of ports with a cable on them are flashing.
  2. None of the service LEDs indicate a problem (shown in the switch hardware
    documentation).
  3. All switches are pingable and Secure Shell (SSH)-able from the management server.
  4. The naming convention for each switch is based on its physical location.
  5. All switches of the same type are running at the same firmware and software levels.
InfiniBand Unified Fabric Manager (if any)
Use the following checklist for InfiniBand Unified Fabric Manager (UFM):
  1. No errors are displayed on the UFM dashboard.
  2. The UFM health check function returns no errors.
  3. The UFM dashboard traffic monitor shows average congestion is less than maximum congestion.
Compute nodes
Use the following checklist for compute nodes:
The naming convention for each node is based on their physical location.
  1. All of the compute nodes have the same hardware configuration.
  2. All of the compute nodes run the same operating system level.
  3. The software packages on different compute nodes are the same.
  4. The clock and time zone settings are correct, and an NTP client is synchronized to an NTP server.
  5. The version levels of all of the software packages on different compute nodes are the same.
  6. Each of the compute nodes can ping every other compute node successfully through
    Ethernet.
  7. Each of the compute nodes can ping every other compute node successfully through
    InfiniBand. 

Utility nodes (if any) 
Us the following checklist for utility nodes:
  1. There are at least two utility nodes for redundancy.
  2. The clock and time zone settings are correct, and an NTP client is synchronized to an NTP server.
Login node (if any)
  1. There are at least two login nodes for redundancy.
Hardware Management Console (if any)
Use the following checklist for Hardware Management Consoles (HMCs):
  1. There are at least two HMCs for redundancy.
  2. All physical machines can be accessed from the HMC.
  3. Each physical machine has at least one logical partition (LPAR) on it.
  4. There is no alert from the HMC.
     
Sample Checklist Format - Software checklist

Post a Comment

0 Comments