|HP's full document|
The following checkist is a shortened, and hopefully, better organized rendition of HP's full crash procedure doc: OZBEKBRC00000611. HP's document goes into various details of versions, what for, why, etc; this checkiist assumes version 11.X and a patched version of q4
When HP-UX crashes, it saves a snapshot of RAM in disk-based swap space or dedicated dump space, reboots the system, and copies the resulting "dump" into /var/adm/crash
A utility called q4, normally loaded on the system, is available to make text files for fast analysis. A patched version of q4 must be loaded to interpret dumps resulting from a "hanging" operating system.
To preprocess the dump, follow these steps and email the resulting files to the HP Response Center for analysis. Steps vary depending on the version of the O/S and the version of q4.
Please note, all email generated from this procedure should be sent to the dump team email address email@example.com using the CALL ID as the SUBJECT.
DO NOT send this information to the engineer's personal address.
After emailing the data, please log a callback against the call to let the engineer know that you have emailed your data.
A recent core.N(10.X) or crash.N(11.X) directory should be listed. (NOTE:N is the next available dump index, which increments with each successive dump.)
The INDEX file in /var/adm/crash/c* and /etc/shutdownlog contains the "panic" statement.
grep _DIR /etc/rc.config.d/save*
The value pointed to by SAVECORE_DIR=(10.X) or SAVECRASH_DIR=(11.X) is where the system places dump files.
A return message "invalid dump header" means the dump is non-existent.
NOTE: If the current dump directory gets full with a dump save, update the directory variable with a directory with more space, and make the new directory to capture future dumps.
swlist -l fileset | grep -i Q4
The following are unpatched versions supplied with the OS:
|OS-Core.Q4||B.10.20||HP-UX Crash Dump Debugger for PA-RISC systems|
|OS-Core.Q4||B.11.00||HP-UX Crash Dump Debugger for PA-RISC systems|
|OS-Core.Q4||B.11.11||HP-UX Crash Dump Debugger for PA-RISC systems|
Download the appropriate version from this site:
For the 10.10 or 10.20 version:
NOTE: the patch number may be superceded over time
NOTE 2: Those links have zero chance of working as 10.X nad 11.0 ar eno longer supported. Additionally, HP has gone to a pay to play model for patches. You need a software service agreement to get patches.
Mount the INSTALL media and verify a matching version of Q4 is available:
swlist -l fileset -s /
| grep Q4OS-Core.Q4 B.10.10 HP-UX Crash Dump Debugger for PA-RISC systems ^^^^^ -matches the O/S
Use swinstall to install it:
# swinstall -vs /
NOTE: csh (c-shell) will cause errors with q4. Use sh-posix.
cd (dump directory) eg: cd /var/adm/crash/core.0 OR /var/adm/crash/crash.0
/usr/contrib/bin/gunzip vmunix.gz (uncompresses the kernel file)For 10.20 through 11.11, type this command and then skip to 4.2:
/usr/contrib/bin/q4prep -pFor 11.20 and beyond, type this command and then skip to 4.2:
/usr/contrib/Q4/bin/q4prep -p If at 10.10, type the following commands:
uncompress /usr/contrib/lib/Q4Lib.tar.Z (ignore the error if this was done previously)
tar -xf /usr/contrib/lib/Q4Lib.tar (output goes into the current directory)
cp q4lib/sample.q4rc.pl ~/.q4rc.pl Note the use of a tilde and letter "l" (not digit 1)
/usr/contrib/bin/q4pxdb vmunix This may complain if vmunix is already preprocessed.
q4 -p . (note the "dot" at the end of the command)
q4> trace event 0 > trace.out q4> include analyze.pl NOTE letter "l" (not digit 1) q4> run Analyze AU >> ana.out NOTE: ctrl-c will interrupt q4 q4> exit
Skip to STEP 6
. /usr/contrib/Q4/bin/set_env Note the 'dot' at the beginning of the command.
/usr/contrib/Q4/bin/q4pxdb vmunix (Disregard "unnecessary" message) /usr/contrib/Q4/bin/q4 -p . (note the "dot" at the end of the command)
q4>run Analyze AU > ana.out q4>run WhatHappened -HANG > what.out NOTE: ctrl-c can interrupt these two commands, which may take several minutes to process.
grep HPMC ana.out trace.out
If either of this lines appear, open a hardware repair request with the hardware support organization for this system.
Also, send the /var/tombstones/ts* file (if that directory exists) matching the "dumptime" listed in the INDEX file. It may well have the hardware fault codes that can aid in isolating the hardware cause.
If an HPMC did not occur, proceed to 6.2.
"MC/ServiceGuard: Unable to maintain contact with cmcld daemon. Performing TOC to ensure data integrity."
If so, type:
cmgetconf |grep E_T /etc/cmcluster/* (Check the cluster for a NODE_TIMEOUT of 2000000) If NODE_TIMEOUT is set to 2 seconds, the crash is probably due to this extremely low setting. To correct the problem: Increase the value to 5-8 seconds in the cluster configuration file and perform a "cmapplyconf" with the cluster down. Also, read this article UXSGLVKBAN00000010 in the http://ITRC.HP.COM technical database for more details on dealing with ServiceGuard-induced crashes If NODE_TIMEOUT was set to 2 seconds and the value was corrected, stop here.
/usr/sbin/swlist -l product | grep PH > patches.out