Customer Viewable
blh@atl.hp.com
Last Revised: 8/95
--------------------
+---------------------------------------------------+
| Help! My Filesystem Is Full And I Can't Get Up! |
+---------------------------------------------------+
filesystem is full
filesystem is full
filesystem is full
filesystem is full
...
Oh, oh! It seems there are only two type of HP-UX system administrators:
Those that have seen this message,
and
those that are going to!
The effect of this message depends primarily on what filesystem is reporting
the dreaded error, with the root filesystem (/) having the worst effect.
When root fills up, things start failing; processes are killed or core dump;
programs that depend on files on the root directories begin to have problems;
and the list goes on.
The first thing to do is to gracefully shutdown the system, unless you know
where the problem is hiding. There are several common reasons for a
filesystem to become full without warning, while other reasons require some
stealth to locate.
System Panics
-------------
One of the biggest files to suddenly appear in the root filesystem can be
found in the directory /tmp/syscore. Normally, this directory does not exist
so core dumps due to a system crash (panic in HP-UX parlance) will not be
saved.
That's good news and bad news: the good news is that a panic will not
suddenly fill your filesystem with a core dump from 16 to 256 megs (or more)
of data. The bad news is that there is little chance to determine the reason
for the system panic without this file.
Is /tmp/syscore the only location for a panic core dump? No. The directory
is specified in the system startup file /etc/rc and is set as the only
parameter given to a program called savecore. When savecore runs, the two
files: hp- core.# and hp-ux.# are created, along with a small file called
bounds. The # sign is a number starting from 0 and incrementing with every
panic that occurs, and bounds keeps track of the next number to use.
When the system reboots after a panic, the /etc/rc file checks to see if
savecore exists and if the directory specified exists...if both are true,
savecore checks the dump area (typically the primary swap area) for a valid
HP-UX memory dump. Finding a properly stored memory dump, the savecore
program announces the date/time that the panic occurred and creates the file
hp-core.0 (if this is the first core dump in the directory). The process
continues until all of physical memory (RAM) has been written to the disk, or
until (oops) the filesystem is full. Without a properly written dump in the
primary swap area, savecore does nothing and displays nothing.
Then, savecore writes a copy of the current /hp-ux file as hp-ux.0 to match
the dump file. If the filesystem is already full, this file is created as a
zero-length file. To be useful, the core dump must also have a copy of
/hp-ux (the kernel file) at the time of the dump.
So what is the best technique to prevent a root filesystem from filling up
due to a system panic? Simply pick another filesystem to store the dump, one
that typically has a lot of space, or a filesystem that always has at least
as much space as the size of RAM plus about 2-3 megs (for hp-ux.#). How do
you locate the size of RAM? You can type the command: dmesg and look at the
amount of real memory that is available. To pick another filesystem (for
example, /disk7), change the two lines in /etc/rc from:
if [ -x /etc/savecore ] && [ -d /tmp/syscore ]
then
/etc/savecore /tmp/syscore
fi
to
if [ -x /etc/savecore ] && [ -d /disk7/syscore ]
then
/etc/savecore /disk7/syscore
fi
If you need the space back after a panic, simply store the contents of the
core dump directory on tape...the simplest command to use is:
cd /my_core_dump_directory
tar cvf /dev/my_favorite_tape_devicefile *
Don't forget to check the list that tar makes; it should show the two files
hp-ux and hp-core and possibly the bounds file. Then, you can remove all the
files from the core dump directory, at which point, you can contact HP to
have the core dump analyzed for the possible reason(s) for the panic.
Filesystem minfree
------------------
Every administrator has probably looked at the bdf command after mounting a
brand new disk and asked: Where did some of that empty space go? The answer
is that approximately 6% to 8% of a disk's space is occupied by inode tables
and superblocks, which contain the pointers to the various blocks of files
that are on the disk. In addition, the default newfs command will reserve
10% minfree or 10% of what's left before files are stored on the disk, to
enhance the filesystem performance.
This buffer allows system administrators to fix problems with the space on a
given disk (once the filesystem is marked full) and still have some room (the
10% minfree area) to work. Although the minfree area can be reduced to zero,
this is not recommended for the root disk since a file system full message
might not allow even the system administrator to log onto the ailing system.
Other disks might be allowed to use 0% minfree, as long as the space is
monitored, or the space usage is essentially fixed. Note also that the HFS
method of disk space management in HP-UX relies heavily on 10% minfree to
keep the performance in allocating and deallocating filespace at a high
level.
Another filesystem tuning is to increase the bytes-per- node value when
initializing the filesystem with newfs. By changing the number of bytes of
disk space managed by an inode, the overhead can be reduced by as much as
50%, at the expense of the total number of files that may be stored with
larger these inodes. This parameter is tricky since it may prevent easy
interchange of data with other Unix systems that cannot handle a wide range
of bytes-per-inode values.
In general, changing this parameter from 2048 bytes to 64K bytes will only
return about 3% or so of the disk space, with a corresponding reduction in
the total number of files that can be stored, but this may be ideal for a
small collection of large files. Be sure to choose a value that will be
compatible with your operating system revision. Large inode sizes are often
not-portable to other systems or revs.
Files that don't belong in /dev
-------------------------------
Another very common problem is a file system full message just after doing a
backup. How can this be? HP-UX is quite friendly in that it will allow
spelling errors, but it will often do something not entirely expected. For
instance, if the user were to misspell the name of the tape drive as in:
tar cvf /dev/rmt/om /
instead of
tar cvf /dev/rmt/0m /
then, rather than displaying an error message such as:
tape not found
or,
devicefile does not exist,
the HP-UX operating system simply creates an ordinary file with the name
given in the tar command (or cpio or fbackup, etc) and all the data to be
backed up begins filling the /dev/rmt/om file until the entire system has
backed itself up onto the root disk. This process eventually fails with a
file system full message.
To find these inadvertent spelling errors in /dev, use the following command:
find /dev -type f -print
This list will contain files that should never appear in /dev, that is,
ordinary files. Occasionally, a core file or other unexpected files may also
be discovered in the /dev directory.
Managing /tmp, /usr/preserve and /usr/tmp
-----------------------------------------
/tmp is one of those directories where everyone has access but few seem to
treat it with respect. /tmp is defined as a temporary storage area and is
not to be considered permanent. Processes like email or the vi editor use
the /tmp directory for files, but normal operations will cleanup afterwards
and not leave files in the /tmp directory. Some HP programs such as update
will leave their logfile in /tmp, but this is considered correct practice in
that the logfile should be reviewed for errors, and then removed or archived.
One way to enforce cleanup of the /tmp directory is to use a command such as:
find /tmp -type f -atime +14 -exec rm {} \;
which will remove any files in /tmp (or files in directories below /tmp)
that have not been accessed in more than 14 days. The other temporary
storage areas, /usr/preserve and /usr/tmp, is less often abused by users
since they overlook it's existence. Again, some processes will create
temporary files in /usr/tmp and should (if they terminate correctly)
remove their files and editors like vi use /usr/preserve. This command
will clear up /usr/tmp of files not accessed in more than 7 days:
find /usr/tmp -atime +7 -exec rm {} \;
System administrators need to decide if /tmp should allow regularly accessed
files to stay in /tmp. A user might bypass the above tests by using the
touch command on the files. In this caase, change the -atime option to
-mtime which means that the file must be modified.
Once in a while, you may need to check for old directories which are not
removed by the above command. The contents will be cleared but after a
while, /tmp may get cluttered with empty directories.
Here's a possibility for files in directories. This combination purges
files that are older than 7 days, followed by a removal of the directory
if it hasn't been updated for 7 days. However, a simple rmdir is used
so the command will fail if the directory isn't empty. Thus, until all
files have been removed, the directory will stay.
find /tmp -type f -atime +7 -print -exec rm -f {} \;
find /tmp -type d -atime +7 -print -exec rmdir {} \;
find /usr/tmp -type f -atime +7 -print -exec rm -f {} \;
find /usr/tmp -type d -atime +7 -print -exec rmdir {} \;
Another common practice is to cleanup /usr/tmp after every full backup.
Managing /usr
-------------
/usr is the largest directory in a standard HP-UX system. It not only
contains most of the executable files, library files and even font files
for X/windows, it is the location where the lp spooler sends all spooled
files. This can use disk space very quickly, especially if a printer
goes down. The directory to check is: /usr/spool/lp/request. There
will be a directory for every printer and large files in those directories
may indicate a spooler problem if they are more than a few minutes old.
Sometimes, the spooler will become confused (there are patches for 7.x, 8.x
and 9.x versions) and leave the printers disabled. Also, administrators may
change and forget to remove test printers which don't really exist. Verify
that the report from lpstat -v and the directories in /usr/spool/lp/request
are the same. The lpstat -v listing shows the printers known to the spooler
so if there are other directories (or files) in /usr/spool/lp/request then
they don't belong there.
Also check the log files in /usr/spool/lp...they are optional but when
started, they will grow without bounds. The files are log and lpd.log.
lpsched -v is used to start lp logging, although HP's JetDirect software
always logs information from the interface scripts.
If you are using DTC's, there are problems that can cause the DTC software to
log large (dozens of megabytes) of errors. Check /usr/adm/ for a DTC manager
logfile, and also /usr/dtcmgr for a similar logfile. These can be reduced
by:
tail name_of_big_logfile > /tmp/dtctail.log
mv /tmp/dtctail.log name_of_big_logfile
Do this for both log files and the files will be reduced to the last 10
entries which should be enough for debugging the DTC problem. It is not
unusual for the DTC software to create 25-50 meg files.
Another area to check is uucp's directories. Like the lp spooler, this
directory can be very dynamic, holding traffic for other nodes or simply the
repository for various files sent or received using uucp. The directory is:
/usr/spool/uucp
and places to look are: .Admin, typically the audit file will grow depending
on traffic; .Log where all the logfiles are kept and then the directories
that are the names of remote machines allowed access to this computer.
Finally, /usr/mail is a place where unexpected bursts of growth can occur.
Very large files can be easily (too easily) emailed to users on the system
and this directory can grow quite rapidly. Sending a big file to everyone
in a distribution list will cause multiple copies of the same file to be
placed in everyone's mailfile, thus growing /usr/mail rapidly.
Managing /system
----------------
This is a directory that keeps track of all the filesets that were installed.
Some of the filesets contain just the Product Description Files (PDF..see man
pdfck) and others, like patches, will contain the new files, a copy of the old
files and customization scripts. This can run into 10 or 20 megs after a
while. However, you do have two choices:
..Remove ./system and all the files in it. The downside is that you won't
be able to run pdfck (a very useful tool) nor can you use rmfn, the
fileset removal tool. However the opsystem will still run fine.
..Move the /system directory to another filesystem using a symbolic link.
That is, assume that /extra has lots of extra space. Here are the steps:
cd /system
find . | cpio -pdmuvxl /extra/system
(check now that /extra/system has all the /system files)
cd /
rm -r /system
ln -s /extra/system /system
Now, all the files have been moved from the root directory but all the
commands (pdfck and rmfn) still work OK. There is one consideration
though: if you perform updates or run rmfn in single user mode, be sure
the /extra disk is mounted.
Check /users/ftp
----------------
Although /users is one of those directories that can grow unexpectedly (from
individual users creating lots of files), there may be an ftp directory,
also known as anonymous ftp. This directory allows users from the network
to send/receive files without having specific logins to the system and this
can lead to the appearance of large files unexpectedly. To check on it,
use:
du /users/ftp
Big numbers (more than 10000) might mean that someone on the net is storing
large files...this can be prevented by changing the permissions on the
/usr/pub directory from 777 to 755. The rest of the (standard) ftp
directories are set to 755 already. Anonymous ftp can be setup using SAM
although finding the option is tricky...for 8.0x systems:
select:
Networks/Communications ->
LAN Hardware and Software (Cards and Services) ->
ARPA Services Configuration ->
Create Public Account for File Transfers ...
For 9.0x systems:
select:
Networking/Communications->
Services: Enable/Disable
Anonymous FTP Disabled Public account file transfer capability
By pressing Return, Anonymous FTP will be highlighted and you can then select
the Action menu by:
pressing f2 (label=Alt) and then the letter a
or
pressing f4 (label=Menubar) and moving the menu to the right using
the arrow keys. Select Enable or Disable as appropriate.
Where to remove filesets
------------------------
Starting with HP-UX version 8.0 and higher, the ability to remove unneeded
filesets or applications has been provided through the program rmfn. This is
based on the update program and indeed, looks almost identical. When rmfn is
run without any parameters, rmfn displays a screen of all the filesets on the
system (information on filesets is stored in /system with indexes, customize
scripts and related information). rmfn also reports the size of the fileset.
Note that files and applications that are added to the system without the use
of the update program will not be seen by the rmfn program.
Why can't I just remove the program? Well, in the good old days of
computers, a simple program was just one item or at most, one directory and
therefore easy to remove. But that was then and this is now; today, programs
are stored in various pieces all over the filesystem. Things like rc files
for local configs, X/Window resources in an app-defaults file, man pages for
documentation, commands needed only for the administrator and other commands
for general use, all part of a single application program.
To track all these items, the update & rmfn programs make use of indexes kept
in the /system directory. Additionally, dependencies between files in
different filesets are tracked, which prevents misloading portions of file
groupings that would not be fully functional as a whole. While many third
party suppliers of software use the update program, many do not so you will
have to refer to your supplier's documentation for space management.
Filesets that might be removed?
===============================
man pages
---------
The documentation pages (man pages) can occupy from 6 to 20 megabytes. Once
the man pages are removed, the man command will no longer find any online
help files but this can save a lot of disk space, especially in a system
where little or no program development takes place.
Another alternative is to remove just the directories is /usr/man that start
with the letters 'cat'. These directories, when they exist provide a
location for formatted help pages. When running the man program, you may see
the message:
formatting...please wait
which comes from the man program as it turns the help pages into a readable
format.
If the /usr/man/cat* directories exist, the finished pages are saved,
thereby avoiding the delay when the same page is requested in the future. A
fully formatted set of man pages may be about 20 megabytes larger than the
unformatted pages. If users are not annoyed with this delay, removing the
/usr/man/cat* directories can save an average of 10 megabytes.
Here's a tip: most users need only the section 1 or section 1m commands for
day to day operations. As the system administrator, you may find that this
space (approximately 3-4 megs) is well worth the time saved, as long as it
doesn't grow any larger. There is a command called catman that can format
complete sections (1 and 1m are the basic HP-UX commands for all users and
system administrators, respectively) and by removing all cat* directories
except:
/usr/man/cat1
/usr/man/cat1m
then just the pages for these commands can be formatted at one time (I would
suggest doing this overnight) by using the command: catman 11m. Now, all the
man page requests for sections 1 and 1m pop up immediately, yet the disk
space will not grow as man pages from other sections are referenced (they
will still be formatted on each occurrence).
Another technique is to remove the pages that have not been accessed in the
last n days where n might be 15 or 30, whatever value fits your site. A
weekly cron job can be started that searches for formatted pages in the
/usr/man/cat* directories, finding the files that are older than the
specified time. Check the find command for time stamp options and use the
-exec option to do an rm of the file(s).
Still another technique is to make the man pages a remote (NFS) directory on
another system. By making /usr/man reside on a single system, dozens of
megabytes of duplicate pages can be eliminated.
Discless
--------
Discless is a feature found in all the 9000 platforms at revision 8.x but not
everyone needs these features. DISCLESS is a fileset that is fairly large
and can be removed if support for discless workstations is not needed.
DISCLESS provides several kernels for various features that discless clients
might need and just these 5 or 6 kernels will occupy 6 to 7 megabytes in
/etc/conf.
NLS files
---------
Native language support is another area that can be trimmed from systems that
do not require language support other than English. Some files can be quite
large for Far Eastern languages where a complex character set (ie, Kana or
Korean) might be needed.
HP Diagnostics
--------------
These are tricky. Removing them can save a lot of space, especially on the
700 and 800 systems. On the other hand, they do provide service people with
detailed information on problems that may occur on the system, along with
detailed logfiles on specific errors. You may wish to discuss the pros and
cons of removing the diagnostics with your local support people.
As with all HP filesets, they can be reinstalled simply by running the update
program and selecting the required fileset(s). Another alternative is to
copy all the files from /usr/diag into another filesystem and then create a
symbolic link. You can use a series of commands like:
..mkdir /mnt1/diag
..cd /usr/diag
..find . | cpio -pdmuxvl /mnt1/diag
..rm -rf /usr/diag <---spell carefully
..ln -s /mnt1/diag /usr/diag
This will save from 8 megs (300/400 systems) to 25 megs (800 series) on the
/usr filesystem while still retaining all the functionality of diagnostics.
A similar alternative is to use an NFS mountpoint for the diagnostics, but
with the caveat that a failure requiring diagnostics may also cause the
network to fail, thus rendering the diagnostics unusable.
lost+found directory
--------------------
During an abnormal powerfail or panic of the system, the filesystem will not
be shutdown cleanly and this may require manual intervention with fsck, the
filesystem fixit program. If fsck is unable to repair files or directories,
rather than delete them, fsck will ask if you wish to fix the problem. If
you answer yes, then the inode (a pointer to files or directories) may be
moved to the lost+found directory and given a name that is actually the inode
number.
The space represented by these entries in lost+found might have been
temporary files that were deleted but their free space was not recorded on
the filesystem, or just ordinary files and/or directories that have lost
their names or their connection with the rest of the directories. In this
case, the system administrator must look at the contents of each file or
directory to determine what, if any, data is to be saved. Otherwise, these
files will occupy space but serve no purpose, and might lead to creeping file
system growth.
Unmounted disks
---------------
HP-UX connects separate disks into a single filesystem called root
(represented by the / symbol) by making directories do double duty. A
directory can be changed into a mount point (a logical connection to another
disk's filesystem) by using the mount command. The file /etc/checklist can
also accomplish this indirectly in that the mount command reads checklist for
guidance on where to mount disks.
The curious thing is that unmounting a disk returns the mount point directory
to a local status and files stored in that directory will return. These
files became dormant after the mount command changed the use of the directory
into a mount point; that is, the files on the root disk exist, but cannot be
seen because the mounted disk 'overlays' the mount point directory.
If a mounted disk is unmounted, the files in the root directory again become
visible, and this can lead to a common error:
1. Someone notices the files are missing and starts to load the files
back from tape.
2. The files are reloaded (or a portion of the files are loaded) and
someone notices that the root filesystem is full or close to full.
3. Someone types the bdf command and discovers that the second disk is
not mounted...and mounts it. Now the files are back to what they
were, but the root filesystem is still almost full.
What happened is that the directory was not in use as a mount point but there
are no red flags showing this condition. That's why the bdf command is so
important: the mounted filesysytems are shown to the right of the listing.
Here's a tip: Bring the system down into single user mode by typing shutdown
0 and once the shell prompt shows up (you won't have to login), find all your
mount points by examining the /etc/checklist file. The second parameter is
the mountpoint directory.
Now, check each directory to see that it is empty. If not, you will need to
clean up the directory as needed, since no other disks are mounted in
single user mode except the root filesystem. Now issue the following command
for every mountpoint:
touch /mount_point/IamNOTmounted
This will create a zero-length file in the mountpoint directory to serve as a
reminder that this is a mountpoint and not a general use directory. When
disks are mounted, this file disappears from view; when a disk is unmounted,
the file comes back as a reminder not to run for the backup tapes.
Locations of other big files
============================
Data collection files
---------------------
HP's LaserRX/UX and PerfView collect information about system performance and
tasks. These files will be found in /usr/bin/rxux, /usr/laserrx or
/usr/perf. Use the du command to quickly check the size of these
directories. Data collection files can grow rapidly in size if collection
limits are not set, and sometimes users
core files:
-----------
Let's start with the files that seem to show up everywhere: core, a.out and
*.o files. A core file is produced in the current working directory whenever
a program is terminated abnormally, typically through some sort of error
condition not anticipated by the program, or to a lesser degree, by receiving
certain signals.
While these core files might be useful to a programmer that is designing or
supporting a particular program, the files are generally wasted space and can
be removed. core files can be a few thousand bytes or up to many megabytes
in size. Here is a command that might be added to a cron entry to remove
core files on a regular basis:
find / -name "core" -exec rm {} \;
Is there a way to prevent core files from being created? Only indirectly.
The core file creation process is part of the kernel and it simply takes
everything in the program's memory and writes it as a file named 'core' in
the current directory. To prevent this from happening, do the following:
cd <to_someplace_where_core_files_shouldn't_be>
touch core
chmod 0 core
chown root core
Now, core files can't be created because the file has no permissions for
any user (except the superuser) and the file can't be changed by the user
since it's owned by root. It is a zero-length file so it occupies no space.
And to keep from having cron messages about not being able to remove a
directory called core, change the above find command to:
find / -name "core" -type f -exec rm {} \;
a.out and *.o files
-------------------
Other files that are commonly left over from programming efforts are a.out
and files ending with .o (which are compiled but unlinked files) and these
are often left in various places by busy programmers. A polite way to notify
users about these files is to send an email message to everyone with a list
of the files:
find / -name "a.out" -print > aout.list
find / -name "*.o" -print > o.list
Then mail these lists to the less than tidy programmers to clean up their
disk space. If this effort is unsuccessful, the previous corefile-remover
command can be modified from the name "core" to the names "a.out" and "*.o",
although you may wish to add an aging option to the find command such that
only files more than 30 days old are removed.
Other interesting files
-----------------------
There is an interesting directory called /etc/newconfig and it contains some
very useful files, namely, the unedited (known to work) custom files such as
rc, gettydefs, inittab, passwd and so on. If one of these critical files in
the /etc directory becomes corrupt (ie, inittab), a known-to-work version can
be copied from /etc/newconfig which will then allow the system to get back
online.
Within the /etc/filesets directory is a complete set of files stored using
the update program. Each fileset is a collection of related software and
these files are a quick reference to where things are stored. These files
could be deleted but there's only 1 meg or so stored there.
Another location is /system. This directory contains a lot of information
about filesets and patches. For patches, some will actually save the old
files so the patch can be removed. As in /etc/filesets, these files are
not used very often except during updates and system recovery, and typically
they occupy less than 3 megs of space.
For 300/400/700 systems, always make a recovery tape (see the command mkrs)
so your system can be booted and repair to your root disk can be done.
/users directory
----------------
So what about users that are abusing their filespace with test files or other
unnecessary data? First, where are these files? The simplest answer is to
check the /users directories, and a good way to do this is with the du
command. Here is a sample:
du /users
2 /users/root/test
32 /users/root
12 /users/rst
480 /users/wpw
2 /users/jes
3308 /users/djs/nova-files
10442 /users/djs
2 /users/mda
6 /users/jws
2 /users/gfm
2 /users/gedu
12 /users/jam
12 /users/blh
11016 /users
The numbers on the left are in blocks, or 512 byte units of measure. These
values are not as meaningful as Kbytes or Mbytes so I simply divide the
number in half and now I can see the usage in Kbytes. The du command shows
the diskspace usage in directories, not individual files and this is the
first step towards tracking down disk space problems.
Now, you'll notice that some directories are not really very interesting such
as /users/root/test (2 blocks or 1 Kbyte), so how can we limit the list to
interesting numbers, such as directories larger than 5 megabytes?
Well, the du command uses left justified numbers, so the standard sort
command won't produce the desired result. Use du piped to sort -r -n for a
listing of the largest directories first. The grep command can be more
useful in this case; here's an example:
du /users | sort -nr | grep ^....[0-9]
The sort command says: sort in reverse order, numbers which are
left-justified. The grep pattern says: starting at column 1 (^), skip 4
columns (dots are don't-care positions), then match only when the 5th column
contains a numeric character ([0-9]). So, the above command applied to our
example above produces:
du /users | sort -r -n | grep ^....[0-9]
11016 /users
10442 /users/djs
which is certainly easier to read. Now it is obvious that /users/djs is the
largest (5,221 Kbytes or approximately 5 Mbytes) user in /users. Another
option is to create some simple scripts to show directories in megs:
#!/bin/sh
# Show usage in directories measured in megabytes (less than 1024 bytes
# are not shown)
#
# Measurement is in 1024*1024 bytes. Fraction 0.488242 is 512 (bytes
# per block given by du), divided by 1024**2 and then multiplied
# by 1000 to allow the int() function to truncate digits beyond
# 3 decimal digits.
#
du $* |awk '{print int($1*0.488242)/1000, " ", $2}' \
| grep -v ^0 \ | sort -nr
What about looking for big files rather than directories? The find command
has an option that will search for the size of a file in blocks or
characters. For instance, to locate all files that are greater than 1 Mbyte,
the following commands will work:
find / -size +2000 | pg
find / -size +1000000c | pg
where the first form specifies 2000 blocks (2000 x 512 bytes = 1 Mbyte apx)
and the second form will find files that are greater than 1,000,000 bytes.
In the man page for find, the use of + to mean greater than or equal is
documented at the beginning of the section. You may wish to change the
output from a pipe into pg (or the more command) to redirect into a file as
in:
find / -size +2000 > /tmp/bigfiles
Some files need to stay, for example, /hp-ux and /SYSBCKUP are usually larger
than 1 Mbyte but don't remove them! The system will have a very difficult
time rebooting when these files are removed (as some new system managers or
well-intentioned users may have already discovered). These are the most
important files needed for the system to boot.
Here's a script called lls (long listing sorted) which sorts the files by
size:
#!/bin/sh
# Long listing sorted
/bin/ll -aHF $* | sort -nr -k 5 | more
As an example:
lls /tmp
-rw-r--r-- 1 root other 2109440 May 17 15:46 blh.tar
-rw-r--r-- 1 root other 316916 Jun 2 00:36 foo2
-rw-rw-rw- 1 root other 260619 Mar 9 05:03 catalog.hp
-rw-r--r-- 1 root other 242044 Sep 24 1994 shoe1.tif
-rw-r--r-- 1 root other 190009 Jan 21 1995 cop_man.ps
-rw-r--r-- 1 root sys 124891 Jan 29 1994 update.log2
-rw-r--r-- 1 root sys 79228 Jan 29 1994 update.log1
-rw-r--r-- 1 root other 48998 May 24 18:11 .newsrc.orig2
-rw-r--r-- 1 root other 48998 May 23 23:33 .newsrc.orig
-rw-r--r-- 1 root other 46514 Aug 28 16:51 .oldnewsrc
-rw-r--r-- 1 root other 46514 Aug 21 15:10 .newsrc
-rw-r--r-- 1 root other 39525 Jan 21 1995 cop-user.sam
-rw-r--r-- 1 root other 22088 Sep 11 1994 .tif
-rwxr-xr-x 1 root other 20480 Nov 22 1994 set_disp*
-rw-r----- 1 root other 20131 Mar 19 13:49 gtest
-rw------- 1 root other 17246 Aug 21 17:08 .gpm
-rw-rw-rw- 1 root other 15185 Jul 24 12:02 stm.log
-rw------- 1 root other 12949 Aug 23 15:39 .netscape-history
Logfiles - information and lots of space!
-----------------------------------------
Many logs are kept in HP-UX systems and the majority grow without bounds,
which can generate the infamous "file system full" message. The root
filesystem is by far the most critical in that many HP-UX processes depend on
having some space available, including space for logfiles. Many of these
logfiles are optional and are not created in a default system, but there are
several that do exist and should be monitored.
Some very common ones that can grow quickly are:
/usr/adm/syslog (network and system logs)
/usr/adm/diag/LOG* (diagnostic logs)
/etc/wtmp (login/logout, etc)
/usr/spool/mqueue/syslog (mail log)
/usr/spool/lp/log (lp log)
For example, the script /etc/netbsdrc starts rwhod by default at HP-UX
revision 7.0. rwhod makes entries in /usr/spool/rwho for every machine it
discovers. These files may be deleted and rwhod may be removed from
/etc/netbsdsrc as a startup daemon. rwhod also generates a lot of LAN
traffic which may not be desirable on a large network. At 8.0 and higher,
rwhod is not started by default and you may wish to leave it disabled.
In general, most system logs are kept in /usr/adm, but as with all HP-UX
commands, there are exceptions. The update and rmfn programs keep logs in
/tmp (ie, update.log and rmfn.log), as well as the new JetDirect Network
printer software which keeps logs in /tmp, each logfile starting with the
filename hpnp*.
One of the big logfile makers is the optional system monitor program
LaserRX/UX which logs computer activity. Depending on the settings used to
quantify 'interesting' processes, the rxlog may grow very rapidly, especially
if the CPU and disk parameters are set to zero (ie, every activity is logged
regardless of importance). Recommended settings for LaserRX/UX are CPU =
disk = 5 which is more than adequate to track important activities.
In the following table, commands that have logfile options are listed,
followed by the location and name of the logfile (user-defined means that the
path/filename is defined at run time), a key (K) to define conditions by
which the file is created, and a short description of the contents.
Command Location K Contents
---------- -------------------------- - ------------------------------
<acctg> /usr/adm/pacct P system accounting
<acctg> /usr/adm/acct/* P system accounting
backup /etc/backuplog A history of /etc/backup script
cron /usr/lib/cron/log A history of cron activities
dmesg /usr/adm/messages (typ) U log: /etc/newconfig/crontab.root
dmesg /usr/adm/msgbuf (typ) U incremental dmesg text
eisa_config /etc/eisa/config.err A eisa_config error log
eisa_config /etc/eisa/config.log A eisa_config activity log
gated user-defined P gateway routing, changes, etc
getx25 /usr/spool/uucp/.Log/* A x.25 PAD caller numbers
getx25 /usr/spool/uucp/X25LOG A x.25 activity log
glbd /etc/ncs/glb_log A GLB diag information
hpnpcfg /tmp/hpnpcfg.log A JetDirect configuration log
hpnpinstall /tmp/hpnpinstall.log A JetDirect installation log
hpnptyd user-defined P JetDirect printer pty logging
hpterm user-defined P hpterm log file
kermit ./debug.log P kermit log file
login /etc/utmp A built on every bootup
login /etc/wtmp R history of logins, state change
login /etc/btmp E list of bad logins
lp /usr/spool/lp/log P lp activities
lp /usr/spool/lp/lpd.log P rlpdaemon logging
lockd user-defined P RPC lock request errors
mountd user-defined P NFS mount errors
netdistd usr/adm/netdist.log P log of netdist activities
nettl /usr/adm/nettl.LOG0... P network tracing/logging
notes /usr/contrib/lib/notes/*log P notes activity log
ns /usr/adm/nettrlog P NS network errors
opx25 /usr/spool/uucp/.Log/* P x.25 (HALGOL) logging
ppl /usr/spool/ppl/log E x.25 ppl log
ppl /usr/spool/ppl/bill E x.25 ppl log for billing
ptydaemon /etc/ptydaemonlog P log of ptydaemon activities
rexd user-defined P RPC activities
reboot /usr/adm/shutdownlog E log of shutdown activites
rlpdaemon /usr/spool/lp/lpd.log P rlp activities
rmfn /tmp/rmfn.log A log of rmfn activites
rstatd user-defined P RPC performance statistics
rusersd user-defined P RPC users error log
rwalld user-defined P RPC rwall command errors
rwhod /usr/spool/rwho A list of machines on the LAN
rbootd /usr/adm/rbootd.log P rbootd activities
sam /usr/sam/log/samlog.. A log of all SAM activites
sam /usr/sam/backup/logfile A log of SAM backup activities
sam /tmp/cluster.log A log of SAM cluster activities
savecore /usr/adm/shutdown.log E log of coredump saves
scopeux /usr/bin/rxux/rxlog... A LaserRX/UX collection files
sendmail /usr/spool/mqueue/syslog A sendmail history
shutdown /usr/adm/shutdownlog E log of shutdown activities
spell /usr/lib/spell/spellhist P history of spell activities
syslogd /usr/adm/syslog A syslogd's log file
su /usr/adm/sulog A history of su command usage
update /tmp/update.log A history of update activities
updist /tmp/update.log A history of updist activities
uucp.. /usr/spool/uucp/diallog A uucp dialing log
uucp.. /usr/spool/uucp/errlog A uucp error log
uulog /usr/spool/uucp/logfile A uucp general log
uucp.. /usr/spool/uucp/syslog A uucp system log
uucp.. /usr/spool/uucp/culog A uucp call log
uucico /usr/spool/uucp/.Log/... A uucico transactions log
uusub /usr/spool/uucp/L_sub A uucp connection statistics
uusub /usr/spool/uucp/R_sub A uucp traffic statistics
update /tmp/update.log A history of update activities
vtdaemon /etc/vtdaemonlog A log of vtdaemon activities
vtdaemon /usr/contrib/lib/*.log A log of vtdaemon diagnostics
Xserver /usr/adm/X*msgs P X/Windows error log file
x29server /usr/adm/x29/x29server/* A x29server logging
ypbindm user-defined P Yellow Pages bind log
ypserv /usr/etc/yp/ypserv.log E Yellow Pages server log
ypxfr /usr/etc/yp/ypxfr.log E YPages database transfers
---------- -------------------------- - ------------------------------
Notes: The K column refers to special conditions for the file's existence:
A = automatically created or appended
E = not created automatically; log is kept only if it exists already
P = program creates logfile by runtime option only
R = required; deleting this file may cause a problem
U = user sets up logging through cron or other means
For files in the R category (required logs), these files can be zeroed out
by using the cat /dev/null command as in:
cat /dev/null > /etc/wtmp
or
>/etc/wtmp
Rather than using the commands: rm, touch, chmod, chown, chgrp in order to
create an empty file, the cat /dev/null technique retains all the
characteristics of the old file. Note that zeroing /etc/wtmp on a running
system may cause errors to be reported from the who command. These errors
are caused by who not finding the users currently logged in. The best way to
trim /etc/wtmp is to do it in single user mode. Do not zero the
/etc/utmp...this is done automatically
As a final note, HP's RemoteWatch now offers many features to monitor log
files, the /dev directory, and all the mounted disks automatically and will
notify the root user when a problem begins to occur by using email.
RemoteWatch information can be obtained through your local HP sales office or
the HP Customer Information Center (800) 752-0900.
Also, SAM has been enhanced to perform many of the big file searches and
offers other disk space management tools in HP-UX revision 9.0 release.
------------------------------
Bill Hassell
Atlanta Response Center
email: blh@atl.hp.com