Body

Title: Help! My Filesystem Is Full And I Can't Get Up!

Customer Viewable

blh@atl.hp.com

Last Revised: 8/95

--------------------

+---------------------------------------------------+

| Help! My Filesystem Is Full And I Can't Get Up! |

+---------------------------------------------------+

filesystem is full

...

Oh, oh! It seems there are only two type of HP-UX system administrators:

Those that have seen this message,

and

those that are going to!

The effect of this message depends primarily on what filesystem is reporting

the dreaded error, with the root filesystem (/) having the worst effect.

When root fills up, things start failing; processes are killed or core dump;

programs that depend on files on the root directories begin to have problems;

and the list goes on.

The first thing to do is to gracefully shutdown the system, unless you know

where the problem is hiding. There are several common reasons for a

filesystem to become full without warning, while other reasons require some

stealth to locate.

System Panics

-------------

One of the biggest files to suddenly appear in the root filesystem can be

found in the directory /tmp/syscore. Normally, this directory does not exist

so core dumps due to a system crash (panic in HP-UX parlance) will not be

saved.

That's good news and bad news: the good news is that a panic will not

suddenly fill your filesystem with a core dump from 16 to 256 megs (or more)

of data. The bad news is that there is little chance to determine the reason

for the system panic without this file.

Is /tmp/syscore the only location for a panic core dump? No. The directory

is specified in the system startup file /etc/rc and is set as the only

parameter given to a program called savecore. When savecore runs, the two

files: hp- core.# and hp-ux.# are created, along with a small file called

bounds. The # sign is a number starting from 0 and incrementing with every

panic that occurs, and bounds keeps track of the next number to use.

When the system reboots after a panic, the /etc/rc file checks to see if

savecore exists and if the directory specified exists...if both are true,

savecore checks the dump area (typically the primary swap area) for a valid

HP-UX memory dump. Finding a properly stored memory dump, the savecore

program announces the date/time that the panic occurred and creates the file

hp-core.0 (if this is the first core dump in the directory). The process

continues until all of physical memory (RAM) has been written to the disk, or

until (oops) the filesystem is full. Without a properly written dump in the

primary swap area, savecore does nothing and displays nothing.

Then, savecore writes a copy of the current /hp-ux file as hp-ux.0 to match

the dump file. If the filesystem is already full, this file is created as a

zero-length file. To be useful, the core dump must also have a copy of

/hp-ux (the kernel file) at the time of the dump.

So what is the best technique to prevent a root filesystem from filling up

due to a system panic? Simply pick another filesystem to store the dump, one

that typically has a lot of space, or a filesystem that always has at least

as much space as the size of RAM plus about 2-3 megs (for hp-ux.#). How do

you locate the size of RAM? You can type the command: dmesg and look at the

amount of real memory that is available. To pick another filesystem (for

example, /disk7), change the two lines in /etc/rc from:

if [ -x /etc/savecore ] && [ -d /tmp/syscore ]

then

/etc/savecore /tmp/syscore

if [ -x /etc/savecore ] && [ -d /disk7/syscore ]

then

/etc/savecore /disk7/syscore

If you need the space back after a panic, simply store the contents of the

core dump directory on tape...the simplest command to use is:

cd /my_core_dump_directory

tar cvf /dev/my_favorite_tape_devicefile *

Don't forget to check the list that tar makes; it should show the two files

hp-ux and hp-core and possibly the bounds file. Then, you can remove all the

files from the core dump directory, at which point, you can contact HP to

have the core dump analyzed for the possible reason(s) for the panic.

Filesystem minfree

------------------

Every administrator has probably looked at the bdf command after mounting a

brand new disk and asked: Where did some of that empty space go? The answer

is that approximately 6% to 8% of a disk's space is occupied by inode tables

and superblocks, which contain the pointers to the various blocks of files

that are on the disk. In addition, the default newfs command will reserve

10% minfree or 10% of what's left before files are stored on the disk, to

enhance the filesystem performance.

This buffer allows system administrators to fix problems with the space on a

given disk (once the filesystem is marked full) and still have some room (the

10% minfree area) to work. Although the minfree area can be reduced to zero,

this is not recommended for the root disk since a file system full message

might not allow even the system administrator to log onto the ailing system.

Other disks might be allowed to use 0% minfree, as long as the space is

monitored, or the space usage is essentially fixed. Note also that the HFS

method of disk space management in HP-UX relies heavily on 10% minfree to

keep the performance in allocating and deallocating filespace at a high

level.

Another filesystem tuning is to increase the bytes-per- node value when

initializing the filesystem with newfs. By changing the number of bytes of

disk space managed by an inode, the overhead can be reduced by as much as

50%, at the expense of the total number of files that may be stored with

larger these inodes. This parameter is tricky since it may prevent easy

interchange of data with other Unix systems that cannot handle a wide range

of bytes-per-inode values.

In general, changing this parameter from 2048 bytes to 64K bytes will only

return about 3% or so of the disk space, with a corresponding reduction in

the total number of files that can be stored, but this may be ideal for a

small collection of large files. Be sure to choose a value that will be

compatible with your operating system revision. Large inode sizes are often

not-portable to other systems or revs.

Files that don't belong in /dev

-------------------------------

Another very common problem is a file system full message just after doing a

backup. How can this be? HP-UX is quite friendly in that it will allow

spelling errors, but it will often do something not entirely expected. For

instance, if the user were to misspell the name of the tape drive as in:

tar cvf /dev/rmt/om /

instead of

tar cvf /dev/rmt/0m /

then, rather than displaying an error message such as:

tape not found

or,

devicefile does not exist,

the HP-UX operating system simply creates an ordinary file with the name

given in the tar command (or cpio or fbackup, etc) and all the data to be

backed up begins filling the /dev/rmt/om file until the entire system has

backed itself up onto the root disk. This process eventually fails with a

file system full message.

To find these inadvertent spelling errors in /dev, use the following command:

find /dev -type f -print

This list will contain files that should never appear in /dev, that is,

ordinary files. Occasionally, a core file or other unexpected files may also

be discovered in the /dev directory.

Managing /tmp, /usr/preserve and /usr/tmp

-----------------------------------------

/tmp is one of those directories where everyone has access but few seem to

treat it with respect. /tmp is defined as a temporary storage area and is

not to be considered permanent. Processes like email or the vi editor use

the /tmp directory for files, but normal operations will cleanup afterwards

and not leave files in the /tmp directory. Some HP programs such as update

will leave their logfile in /tmp, but this is considered correct practice in

that the logfile should be reviewed for errors, and then removed or archived.

One way to enforce cleanup of the /tmp directory is to use a command such as:

find /tmp -type f -atime +14 -exec rm {} \;

which will remove any files in /tmp (or files in directories below /tmp)

that have not been accessed in more than 14 days. The other temporary

storage areas, /usr/preserve and /usr/tmp, is less often abused by users

since they overlook it's existence. Again, some processes will create

temporary files in /usr/tmp and should (if they terminate correctly)

remove their files and editors like vi use /usr/preserve. This command

will clear up /usr/tmp of files not accessed in more than 7 days:

find /usr/tmp -atime +7 -exec rm {} \;

System administrators need to decide if /tmp should allow regularly accessed

files to stay in /tmp. A user might bypass the above tests by using the

touch command on the files. In this caase, change the -atime option to

-mtime which means that the file must be modified.

Once in a while, you may need to check for old directories which are not

removed by the above command. The contents will be cleared but after a

while, /tmp may get cluttered with empty directories.

Here's a possibility for files in directories. This combination purges

files that are older than 7 days, followed by a removal of the directory

if it hasn't been updated for 7 days. However, a simple rmdir is used

so the command will fail if the directory isn't empty. Thus, until all

files have been removed, the directory will stay.

find /tmp -type f -atime +7 -print -exec rm -f {} \;

find /tmp -type d -atime +7 -print -exec rmdir {} \;

find /usr/tmp -type f -atime +7 -print -exec rm -f {} \;

find /usr/tmp -type d -atime +7 -print -exec rmdir {} \;

Another common practice is to cleanup /usr/tmp after every full backup.

Managing /usr

-------------

/usr is the largest directory in a standard HP-UX system. It not only

contains most of the executable files, library files and even font files

for X/windows, it is the location where the lp spooler sends all spooled

files. This can use disk space very quickly, especially if a printer

goes down. The directory to check is: /usr/spool/lp/request. There

will be a directory for every printer and large files in those directories

may indicate a spooler problem if they are more than a few minutes old.

Sometimes, the spooler will become confused (there are patches for 7.x, 8.x

and 9.x versions) and leave the printers disabled. Also, administrators may

change and forget to remove test printers which don't really exist. Verify

that the report from lpstat -v and the directories in /usr/spool/lp/request

are the same. The lpstat -v listing shows the printers known to the spooler

so if there are other directories (or files) in /usr/spool/lp/request then

they don't belong there.

Also check the log files in /usr/spool/lp...they are optional but when

started, they will grow without bounds. The files are log and lpd.log.

lpsched -v is used to start lp logging, although HP's JetDirect software

always logs information from the interface scripts.

If you are using DTC's, there are problems that can cause the DTC software to

log large (dozens of megabytes) of errors. Check /usr/adm/ for a DTC manager

logfile, and also /usr/dtcmgr for a similar logfile. These can be reduced

by:

tail name_of_big_logfile > /tmp/dtctail.log

mv /tmp/dtctail.log name_of_big_logfile

Do this for both log files and the files will be reduced to the last 10

entries which should be enough for debugging the DTC problem. It is not

unusual for the DTC software to create 25-50 meg files.

Another area to check is uucp's directories. Like the lp spooler, this

directory can be very dynamic, holding traffic for other nodes or simply the

repository for various files sent or received using uucp. The directory is:

/usr/spool/uucp

and places to look are: .Admin, typically the audit file will grow depending

on traffic; .Log where all the logfiles are kept and then the directories

that are the names of remote machines allowed access to this computer.

Finally, /usr/mail is a place where unexpected bursts of growth can occur.

Very large files can be easily (too easily) emailed to users on the system

and this directory can grow quite rapidly. Sending a big file to everyone

in a distribution list will cause multiple copies of the same file to be

placed in everyone's mailfile, thus growing /usr/mail rapidly.

Managing /system

----------------

This is a directory that keeps track of all the filesets that were installed.

Some of the filesets contain just the Product Description Files (PDF..see man

pdfck) and others, like patches, will contain the new files, a copy of the old

files and customization scripts. This can run into 10 or 20 megs after a

while. However, you do have two choices:

..Remove ./system and all the files in it. The downside is that you won't

be able to run pdfck (a very useful tool) nor can you use rmfn, the

fileset removal tool. However the opsystem will still run fine.

..Move the /system directory to another filesystem using a symbolic link.

That is, assume that /extra has lots of extra space. Here are the steps:

cd /system

find . | cpio -pdmuvxl /extra/system

(check now that /extra/system has all the /system files)

cd /

rm -r /system

ln -s /extra/system /system

Now, all the files have been moved from the root directory but all the

commands (pdfck and rmfn) still work OK. There is one consideration

though: if you perform updates or run rmfn in single user mode, be sure

the /extra disk is mounted.

Check /users/ftp

----------------

Although /users is one of those directories that can grow unexpectedly (from

individual users creating lots of files), there may be an ftp directory,

also known as anonymous ftp. This directory allows users from the network

to send/receive files without having specific logins to the system and this

can lead to the appearance of large files unexpectedly. To check on it,

use:

du /users/ftp

Big numbers (more than 10000) might mean that someone on the net is storing

large files...this can be prevented by changing the permissions on the

/usr/pub directory from 777 to 755. The rest of the (standard) ftp

directories are set to 755 already. Anonymous ftp can be setup using SAM

although finding the option is tricky...for 8.0x systems:

select:

Networks/Communications ->

LAN Hardware and Software (Cards and Services) ->

ARPA Services Configuration ->

Create Public Account for File Transfers ...

For 9.0x systems:

select:

Networking/Communications->

Services: Enable/Disable

Anonymous FTP Disabled Public account file transfer capability

By pressing Return, Anonymous FTP will be highlighted and you can then select

the Action menu by:

pressing f2 (label=Alt) and then the letter a

pressing f4 (label=Menubar) and moving the menu to the right using

the arrow keys. Select Enable or Disable as appropriate.

Where to remove filesets

------------------------

Starting with HP-UX version 8.0 and higher, the ability to remove unneeded

filesets or applications has been provided through the program rmfn. This is

based on the update program and indeed, looks almost identical. When rmfn is

run without any parameters, rmfn displays a screen of all the filesets on the

system (information on filesets is stored in /system with indexes, customize

scripts and related information). rmfn also reports the size of the fileset.

Note that files and applications that are added to the system without the use

of the update program will not be seen by the rmfn program.

Why can't I just remove the program? Well, in the good old days of

computers, a simple program was just one item or at most, one directory and

therefore easy to remove. But that was then and this is now; today, programs

are stored in various pieces all over the filesystem. Things like rc files

for local configs, X/Window resources in an app-defaults file, man pages for

documentation, commands needed only for the administrator and other commands

for general use, all part of a single application program.

To track all these items, the update & rmfn programs make use of indexes kept

in the /system directory. Additionally, dependencies between files in

different filesets are tracked, which prevents misloading portions of file

groupings that would not be fully functional as a whole. While many third

party suppliers of software use the update program, many do not so you will

have to refer to your supplier's documentation for space management.

Filesets that might be removed?

===============================

man pages

---------

The documentation pages (man pages) can occupy from 6 to 20 megabytes. Once

the man pages are removed, the man command will no longer find any online

help files but this can save a lot of disk space, especially in a system

where little or no program development takes place.

Another alternative is to remove just the directories is /usr/man that start

with the letters 'cat'. These directories, when they exist provide a

location for formatted help pages. When running the man program, you may see

the message:

formatting...please wait

which comes from the man program as it turns the help pages into a readable

format.

If the /usr/man/cat* directories exist, the finished pages are saved,

thereby avoiding the delay when the same page is requested in the future. A

fully formatted set of man pages may be about 20 megabytes larger than the

unformatted pages. If users are not annoyed with this delay, removing the

/usr/man/cat* directories can save an average of 10 megabytes.

Here's a tip: most users need only the section 1 or section 1m commands for

day to day operations. As the system administrator, you may find that this

space (approximately 3-4 megs) is well worth the time saved, as long as it

doesn't grow any larger. There is a command called catman that can format

complete sections (1 and 1m are the basic HP-UX commands for all users and

system administrators, respectively) and by removing all cat* directories

except:

/usr/man/cat1

/usr/man/cat1m

then just the pages for these commands can be formatted at one time (I would

suggest doing this overnight) by using the command: catman 11m. Now, all the

man page requests for sections 1 and 1m pop up immediately, yet the disk

space will not grow as man pages from other sections are referenced (they

will still be formatted on each occurrence).

Another technique is to remove the pages that have not been accessed in the

last n days where n might be 15 or 30, whatever value fits your site. A

weekly cron job can be started that searches for formatted pages in the

/usr/man/cat* directories, finding the files that are older than the

specified time. Check the find command for time stamp options and use the

-exec option to do an rm of the file(s).

Still another technique is to make the man pages a remote (NFS) directory on

another system. By making /usr/man reside on a single system, dozens of

megabytes of duplicate pages can be eliminated.

Discless

--------

Discless is a feature found in all the 9000 platforms at revision 8.x but not

everyone needs these features. DISCLESS is a fileset that is fairly large

and can be removed if support for discless workstations is not needed.

DISCLESS provides several kernels for various features that discless clients

might need and just these 5 or 6 kernels will occupy 6 to 7 megabytes in

/etc/conf.

NLS files

---------

Native language support is another area that can be trimmed from systems that

do not require language support other than English. Some files can be quite

large for Far Eastern languages where a complex character set (ie, Kana or

Korean) might be needed.

HP Diagnostics

--------------

These are tricky. Removing them can save a lot of space, especially on the

700 and 800 systems. On the other hand, they do provide service people with

detailed information on problems that may occur on the system, along with

detailed logfiles on specific errors. You may wish to discuss the pros and

cons of removing the diagnostics with your local support people.

As with all HP filesets, they can be reinstalled simply by running the update

program and selecting the required fileset(s). Another alternative is to

copy all the files from /usr/diag into another filesystem and then create a

symbolic link. You can use a series of commands like:

..mkdir /mnt1/diag

..cd /usr/diag

..find . | cpio -pdmuxvl /mnt1/diag

..rm -rf /usr/diag <---spell carefully

..ln -s /mnt1/diag /usr/diag

This will save from 8 megs (300/400 systems) to 25 megs (800 series) on the

/usr filesystem while still retaining all the functionality of diagnostics.

A similar alternative is to use an NFS mountpoint for the diagnostics, but

with the caveat that a failure requiring diagnostics may also cause the

network to fail, thus rendering the diagnostics unusable.

lost+found directory

--------------------

During an abnormal powerfail or panic of the system, the filesystem will not

be shutdown cleanly and this may require manual intervention with fsck, the

filesystem fixit program. If fsck is unable to repair files or directories,

rather than delete them, fsck will ask if you wish to fix the problem. If

you answer yes, then the inode (a pointer to files or directories) may be

moved to the lost+found directory and given a name that is actually the inode

number.

The space represented by these entries in lost+found might have been

temporary files that were deleted but their free space was not recorded on

the filesystem, or just ordinary files and/or directories that have lost

their names or their connection with the rest of the directories. In this

case, the system administrator must look at the contents of each file or

directory to determine what, if any, data is to be saved. Otherwise, these

files will occupy space but serve no purpose, and might lead to creeping file

system growth.

Unmounted disks

---------------

HP-UX connects separate disks into a single filesystem called root

(represented by the / symbol) by making directories do double duty. A

directory can be changed into a mount point (a logical connection to another

disk's filesystem) by using the mount command. The file /etc/checklist can

also accomplish this indirectly in that the mount command reads checklist for

guidance on where to mount disks.

The curious thing is that unmounting a disk returns the mount point directory

to a local status and files stored in that directory will return. These

files became dormant after the mount command changed the use of the directory

into a mount point; that is, the files on the root disk exist, but cannot be

seen because the mounted disk 'overlays' the mount point directory.

If a mounted disk is unmounted, the files in the root directory again become

visible, and this can lead to a common error:

1. Someone notices the files are missing and starts to load the files

back from tape.

2. The files are reloaded (or a portion of the files are loaded) and

someone notices that the root filesystem is full or close to full.

3. Someone types the bdf command and discovers that the second disk is

not mounted...and mounts it. Now the files are back to what they

were, but the root filesystem is still almost full.

What happened is that the directory was not in use as a mount point but there

are no red flags showing this condition. That's why the bdf command is so

important: the mounted filesysytems are shown to the right of the listing.

Here's a tip: Bring the system down into single user mode by typing shutdown

0 and once the shell prompt shows up (you won't have to login), find all your

mount points by examining the /etc/checklist file. The second parameter is

the mountpoint directory.

Now, check each directory to see that it is empty. If not, you will need to

clean up the directory as needed, since no other disks are mounted in

single user mode except the root filesystem. Now issue the following command

for every mountpoint:

touch /mount_point/IamNOTmounted

This will create a zero-length file in the mountpoint directory to serve as a

reminder that this is a mountpoint and not a general use directory. When

disks are mounted, this file disappears from view; when a disk is unmounted,

the file comes back as a reminder not to run for the backup tapes.

Locations of other big files

============================

Data collection files

---------------------

HP's LaserRX/UX and PerfView collect information about system performance and

tasks. These files will be found in /usr/bin/rxux, /usr/laserrx or

/usr/perf. Use the du command to quickly check the size of these

directories. Data collection files can grow rapidly in size if collection

limits are not set, and sometimes users

core files:

-----------

Let's start with the files that seem to show up everywhere: core, a.out and

*.o files. A core file is produced in the current working directory whenever

a program is terminated abnormally, typically through some sort of error

condition not anticipated by the program, or to a lesser degree, by receiving

certain signals.

While these core files might be useful to a programmer that is designing or

supporting a particular program, the files are generally wasted space and can

be removed. core files can be a few thousand bytes or up to many megabytes

in size. Here is a command that might be added to a cron entry to remove

core files on a regular basis:

find / -name "core" -exec rm {} \;

Is there a way to prevent core files from being created? Only indirectly.

The core file creation process is part of the kernel and it simply takes

everything in the program's memory and writes it as a file named 'core' in

the current directory. To prevent this from happening, do the following:

cd <to_someplace_where_core_files_shouldn't_be>

touch core

chmod 0 core

chown root core

Now, core files can't be created because the file has no permissions for

any user (except the superuser) and the file can't be changed by the user

since it's owned by root. It is a zero-length file so it occupies no space.

And to keep from having cron messages about not being able to remove a

directory called core, change the above find command to:

find / -name "core" -type f -exec rm {} \;

a.out and *.o files

-------------------

Other files that are commonly left over from programming efforts are a.out

and files ending with .o (which are compiled but unlinked files) and these

are often left in various places by busy programmers. A polite way to notify

users about these files is to send an email message to everyone with a list

of the files:

find / -name "a.out" -print > aout.list

find / -name "*.o" -print > o.list

Then mail these lists to the less than tidy programmers to clean up their

disk space. If this effort is unsuccessful, the previous corefile-remover

command can be modified from the name "core" to the names "a.out" and "*.o",

although you may wish to add an aging option to the find command such that

only files more than 30 days old are removed.

Other interesting files

-----------------------

There is an interesting directory called /etc/newconfig and it contains some

very useful files, namely, the unedited (known to work) custom files such as

rc, gettydefs, inittab, passwd and so on. If one of these critical files in

the /etc directory becomes corrupt (ie, inittab), a known-to-work version can

be copied from /etc/newconfig which will then allow the system to get back

online.

Within the /etc/filesets directory is a complete set of files stored using

the update program. Each fileset is a collection of related software and

these files are a quick reference to where things are stored. These files

could be deleted but there's only 1 meg or so stored there.

Another location is /system. This directory contains a lot of information

about filesets and patches. For patches, some will actually save the old

files so the patch can be removed. As in /etc/filesets, these files are

not used very often except during updates and system recovery, and typically

they occupy less than 3 megs of space.

For 300/400/700 systems, always make a recovery tape (see the command mkrs)

so your system can be booted and repair to your root disk can be done.

/users directory

----------------

So what about users that are abusing their filespace with test files or other

unnecessary data? First, where are these files? The simplest answer is to

check the /users directories, and a good way to do this is with the du

command. Here is a sample:

du /users

2 /users/root/test

32 /users/root

12 /users/rst

480 /users/wpw

2 /users/jes

3308 /users/djs/nova-files

10442 /users/djs

2 /users/mda

6 /users/jws

2 /users/gfm

2 /users/gedu

12 /users/jam

12 /users/blh

11016 /users

The numbers on the left are in blocks, or 512 byte units of measure. These

values are not as meaningful as Kbytes or Mbytes so I simply divide the

number in half and now I can see the usage in Kbytes. The du command shows

the diskspace usage in directories, not individual files and this is the

first step towards tracking down disk space problems.

Now, you'll notice that some directories are not really very interesting such

as /users/root/test (2 blocks or 1 Kbyte), so how can we limit the list to

interesting numbers, such as directories larger than 5 megabytes?

Well, the du command uses left justified numbers, so the standard sort

command won't produce the desired result. Use du piped to sort -r -n for a

listing of the largest directories first. The grep command can be more

useful in this case; here's an example:

du /users | sort -nr | grep ^....[0-9]

The sort command says: sort in reverse order, numbers which are

left-justified. The grep pattern says: starting at column 1 (^), skip 4

columns (dots are don't-care positions), then match only when the 5th column

contains a numeric character ([0-9]). So, the above command applied to our

example above produces:

du /users | sort -r -n | grep ^....[0-9]

11016 /users

10442 /users/djs

which is certainly easier to read. Now it is obvious that /users/djs is the

largest (5,221 Kbytes or approximately 5 Mbytes) user in /users. Another

option is to create some simple scripts to show directories in megs:

#!/bin/sh

# Show usage in directories measured in megabytes (less than 1024 bytes

# are not shown)

# Measurement is in 1024*1024 bytes. Fraction 0.488242 is 512 (bytes

# per block given by du), divided by 1024**2 and then multiplied

# by 1000 to allow the int() function to truncate digits beyond

# 3 decimal digits.

du $* |awk '{print int($1*0.488242)/1000, " ", $2}' \

| grep -v ^0 \ | sort -nr

What about looking for big files rather than directories? The find command

has an option that will search for the size of a file in blocks or

characters. For instance, to locate all files that are greater than 1 Mbyte,

the following commands will work:

find / -size +2000 | pg

find / -size +1000000c | pg

where the first form specifies 2000 blocks (2000 x 512 bytes = 1 Mbyte apx)

and the second form will find files that are greater than 1,000,000 bytes.

In the man page for find, the use of + to mean greater than or equal is

documented at the beginning of the section. You may wish to change the

output from a pipe into pg (or the more command) to redirect into a file as

in:

find / -size +2000 > /tmp/bigfiles

Some files need to stay, for example, /hp-ux and /SYSBCKUP are usually larger

than 1 Mbyte but don't remove them! The system will have a very difficult

time rebooting when these files are removed (as some new system managers or

well-intentioned users may have already discovered). These are the most

important files needed for the system to boot.

Here's a script called lls (long listing sorted) which sorts the files by

size:

#!/bin/sh

# Long listing sorted

/bin/ll -aHF $* | sort -nr -k 5 | more

As an example:

lls /tmp

-rw-r--r-- 1 root other 2109440 May 17 15:46 blh.tar

-rw-r--r-- 1 root other 316916 Jun 2 00:36 foo2

-rw-rw-rw- 1 root other 260619 Mar 9 05:03 catalog.hp

-rw-r--r-- 1 root other 242044 Sep 24 1994 shoe1.tif

-rw-r--r-- 1 root other 190009 Jan 21 1995 cop_man.ps

-rw-r--r-- 1 root sys 124891 Jan 29 1994 update.log2

-rw-r--r-- 1 root sys 79228 Jan 29 1994 update.log1

-rw-r--r-- 1 root other 48998 May 24 18:11 .newsrc.orig2

-rw-r--r-- 1 root other 48998 May 23 23:33 .newsrc.orig

-rw-r--r-- 1 root other 46514 Aug 28 16:51 .oldnewsrc

-rw-r--r-- 1 root other 46514 Aug 21 15:10 .newsrc

-rw-r--r-- 1 root other 39525 Jan 21 1995 cop-user.sam

-rw-r--r-- 1 root other 22088 Sep 11 1994 .tif

-rwxr-xr-x 1 root other 20480 Nov 22 1994 set_disp*

-rw-r----- 1 root other 20131 Mar 19 13:49 gtest

-rw------- 1 root other 17246 Aug 21 17:08 .gpm

-rw-rw-rw- 1 root other 15185 Jul 24 12:02 stm.log

-rw------- 1 root other 12949 Aug 23 15:39 .netscape-history

Logfiles - information and lots of space!

-----------------------------------------

Many logs are kept in HP-UX systems and the majority grow without bounds,

which can generate the infamous "file system full" message. The root

filesystem is by far the most critical in that many HP-UX processes depend on

having some space available, including space for logfiles. Many of these

logfiles are optional and are not created in a default system, but there are

several that do exist and should be monitored.

Some very common ones that can grow quickly are:

/usr/adm/syslog (network and system logs)

/usr/adm/diag/LOG* (diagnostic logs)

/etc/wtmp (login/logout, etc)

/usr/spool/mqueue/syslog (mail log)

/usr/spool/lp/log (lp log)

For example, the script /etc/netbsdrc starts rwhod by default at HP-UX

revision 7.0. rwhod makes entries in /usr/spool/rwho for every machine it

discovers. These files may be deleted and rwhod may be removed from

/etc/netbsdsrc as a startup daemon. rwhod also generates a lot of LAN

traffic which may not be desirable on a large network. At 8.0 and higher,

rwhod is not started by default and you may wish to leave it disabled.

In general, most system logs are kept in /usr/adm, but as with all HP-UX

commands, there are exceptions. The update and rmfn programs keep logs in

/tmp (ie, update.log and rmfn.log), as well as the new JetDirect Network

printer software which keeps logs in /tmp, each logfile starting with the

filename hpnp*.

One of the big logfile makers is the optional system monitor program

LaserRX/UX which logs computer activity. Depending on the settings used to

quantify 'interesting' processes, the rxlog may grow very rapidly, especially

if the CPU and disk parameters are set to zero (ie, every activity is logged

regardless of importance). Recommended settings for LaserRX/UX are CPU =

disk = 5 which is more than adequate to track important activities.

In the following table, commands that have logfile options are listed,

followed by the location and name of the logfile (user-defined means that the

path/filename is defined at run time), a key (K) to define conditions by

which the file is created, and a short description of the contents.

Command Location K Contents

---------- -------------------------- - ------------------------------

<acctg> /usr/adm/pacct P system accounting

<acctg> /usr/adm/acct/* P system accounting

backup /etc/backuplog A history of /etc/backup script

cron /usr/lib/cron/log A history of cron activities

dmesg /usr/adm/messages (typ) U log: /etc/newconfig/crontab.root

dmesg /usr/adm/msgbuf (typ) U incremental dmesg text

eisa_config /etc/eisa/config.err A eisa_config error log

eisa_config /etc/eisa/config.log A eisa_config activity log

gated user-defined P gateway routing, changes, etc

getx25 /usr/spool/uucp/.Log/* A x.25 PAD caller numbers

getx25 /usr/spool/uucp/X25LOG A x.25 activity log

glbd /etc/ncs/glb_log A GLB diag information

hpnpcfg /tmp/hpnpcfg.log A JetDirect configuration log

hpnpinstall /tmp/hpnpinstall.log A JetDirect installation log

hpnptyd user-defined P JetDirect printer pty logging

hpterm user-defined P hpterm log file

kermit ./debug.log P kermit log file

lp /usr/spool/lp/log P lp activities

lp /usr/spool/lp/lpd.log P rlpdaemon logging

lockd user-defined P RPC lock request errors

mountd user-defined P NFS mount errors

netdistd usr/adm/netdist.log P log of netdist activities

nettl /usr/adm/nettl.LOG0... P network tracing/logging

notes /usr/contrib/lib/notes/*log P notes activity log

ns /usr/adm/nettrlog P NS network errors

opx25 /usr/spool/uucp/.Log/* P x.25 (HALGOL) logging

ppl /usr/spool/ppl/log E x.25 ppl log

ppl /usr/spool/ppl/bill E x.25 ppl log for billing

ptydaemon /etc/ptydaemonlog P log of ptydaemon activities

rexd user-defined P RPC activities

reboot /usr/adm/shutdownlog E log of shutdown activites

rlpdaemon /usr/spool/lp/lpd.log P rlp activities

rmfn /tmp/rmfn.log A log of rmfn activites

rstatd user-defined P RPC performance statistics

rusersd user-defined P RPC users error log

rwalld user-defined P RPC rwall command errors

rwhod /usr/spool/rwho A list of machines on the LAN

rbootd /usr/adm/rbootd.log P rbootd activities

sam /usr/sam/log/samlog.. A log of all SAM activites

sam /usr/sam/backup/logfile A log of SAM backup activities

sam /tmp/cluster.log A log of SAM cluster activities

savecore /usr/adm/shutdown.log E log of coredump saves

scopeux /usr/bin/rxux/rxlog... A LaserRX/UX collection files

sendmail /usr/spool/mqueue/syslog A sendmail history

shutdown /usr/adm/shutdownlog E log of shutdown activities

spell /usr/lib/spell/spellhist P history of spell activities

syslogd /usr/adm/syslog A syslogd's log file

su /usr/adm/sulog A history of su command usage

update /tmp/update.log A history of update activities

updist /tmp/update.log A history of updist activities

uucp.. /usr/spool/uucp/diallog A uucp dialing log

uucp.. /usr/spool/uucp/errlog A uucp error log

uulog /usr/spool/uucp/logfile A uucp general log

uucp.. /usr/spool/uucp/syslog A uucp system log

uucp.. /usr/spool/uucp/culog A uucp call log

uucico /usr/spool/uucp/.Log/... A uucico transactions log

uusub /usr/spool/uucp/L_sub A uucp connection statistics

uusub /usr/spool/uucp/R_sub A uucp traffic statistics

update /tmp/update.log A history of update activities

vtdaemon /etc/vtdaemonlog A log of vtdaemon activities

vtdaemon /usr/contrib/lib/*.log A log of vtdaemon diagnostics

Xserver /usr/adm/X*msgs P X/Windows error log file

x29server /usr/adm/x29/x29server/* A x29server logging

ypbindm user-defined P Yellow Pages bind log

ypserv /usr/etc/yp/ypserv.log E Yellow Pages server log

ypxfr /usr/etc/yp/ypxfr.log E YPages database transfers

---------- -------------------------- - ------------------------------

Notes: The K column refers to special conditions for the file's existence:

A = automatically created or appended

E = not created automatically; log is kept only if it exists already

P = program creates logfile by runtime option only

R = required; deleting this file may cause a problem

U = user sets up logging through cron or other means

For files in the R category (required logs), these files can be zeroed out

by using the cat /dev/null command as in:

cat /dev/null > /etc/wtmp

>/etc/wtmp

Rather than using the commands: rm, touch, chmod, chown, chgrp in order to

create an empty file, the cat /dev/null technique retains all the

characteristics of the old file. Note that zeroing /etc/wtmp on a running

system may cause errors to be reported from the who command. These errors

are caused by who not finding the users currently logged in. The best way to

trim /etc/wtmp is to do it in single user mode. Do not zero the

/etc/utmp...this is done automatically

As a final note, HP's RemoteWatch now offers many features to monitor log

files, the /dev directory, and all the mounted disks automatically and will

notify the root user when a problem begins to occur by using email.

RemoteWatch information can be obtained through your local HP sales office or

the HP Customer Information Center (800) 752-0900.

Also, SAM has been enhanced to perform many of the big file searches and

offers other disk space management tools in HP-UX revision 9.0 release.

------------------------------

Bill Hassell

Atlanta Response Center

email: blh@atl.hp.com