Linux Kernelogic

Thursday, May 27, 2010

How Many Open Files?

We knew the answer to this question once, back when the world was young and full of truth. Without hesitation, we'd have spouted "Just take the output of lsof | wc -l!" And it's true enough, in a general sort of way. But if you asked me the same question now, my answer would be: "Why do you want to know?"

Are you, for instance, attempting to determine whether the number of open files is exceeding the limit set in the kernel? Then the output of lsof will be practically useless.

On *nix systems, there is a limit set in the kernel on how many open file descriptors are allowed on the system. This may be compiled in, or it may be tunable on the fly. In Linux, the value of this parameter can be read from and written to the proc filesystem.
[root@srv-4 proc]# cat /proc/sys/fs/file-max
52427

On this system 52,427 open file descriptors are permitted. We are unlikely to run out. If we wanted to change it, we'd do something like this:
[root@srv-4 proc]# echo "104854" > /proc/sys/fs/file-max
[root@srv-4 proc]# cat /proc/sys/fs/file-max
104854

(Warning, change kernel parameters on the fly only with great caution and good cause) Can also be achieved by setting the vm.file-max paramater in /etc/sysctl.conf

But how do we know how many file descriptors are being used?
[root@srv-4 proc]# cat /proc/sys/fs/file-nr
3391 969 52427
| | |
| | |
| | maximum open file descriptors
| total free allocated file descriptors
total allocated file descriptors
(the number of file descriptors allocated since boot)

The number of open file descriptors is column 1 - column 2; 2325 in this case. (Note: we have read contradictory definitions of the second column in newsgroups. Some people say it is the number of used allocated file descriptors - just the opposite of what we've stated here. Luckily, we can prove that the second number is free descriptors. Just launch an xterm or any new process and watch the second number go down.)

What about the method we mentioned earlier, running lsof to get the number of open files?
[root@srv-4 proc]# lsof | wc -l
4140

Oh my. There is quite a discrepancy here. The lsof utility reports 4140 open files, while /proc/sys/fs/file-nr says, if our math is correct, 2325. So how many open files are there?

What is an open file?

Is an open file a file that is being used, or is it an open file descriptor? A file descriptor is a data structure used by a program to get a handle on a file, the most well know being 0,1,2 for standard in, standard out, and standard error. The file-max kernel parameter refers to open file descriptors, and file-nr gives us the current number of open file descriptors. But lsof lists all open files, including files which are not using file descriptors - such as current working directories, memory mapped library files, and executable text files. To illustrate, let's examine the difference between the output of lsof for a given pid and the file descriptors listed for that pid in /proc.

Pick a PID, any PID

Let's look at this process:
usr-4 2381 0.0 0.5 5168 2748 pts/14 S 14:42 0:01 vim openfiles.html

root@srv-4 usr-4]# lsof | grep 2381
vim 2381 usr-4 cwd DIR 3,8 4096 2621517 /n
vim 2381 usr-4 rtd DIR 3,5 4096 2 /
vim 2381 usr-4 txt REG 3,2 2160520 34239 /usr/bin/vim
vim 2381 usr-4 mem REG 3,5 85420 144496 /lib/ld-2.2.5.so
vim 2381 usr-4 mem REG 3,2 371 20974 /usr/lib/locale/LC_IDENTIFICATION
vim 2381 usr-4 mem REG 3,2 20666 192622 /usr/lib/gconv/gconv-modules.cache
vim 2381 usr-4 mem REG 3,2 29 20975 /usr/lib/locale/LC_MEASUREMENT
vim 2381 usr-4 mem REG 3,2 65 20979 /usr/lib/locale/LC_TELEPHONE
vim 2381 usr-4 mem REG 3,2 161 19742 /usr/lib/locale/LC_ADDRESS
vim 2381 usr-4 mem REG 3,2 83 20977 /usr/lib/locale/LC_NAME
vim 2381 usr-4 mem REG 3,2 40 20978 /usr/lib/locale/LC_PAPER
vim 2381 usr-4 mem REG 3,2 58 51851 /usr/lib/locale/LC_MESSAGES/SYSL
vim 2381 usr-4 mem REG 3,2 292 20976 /usr/lib/locale/LC_MONETARY
vim 2381 usr-4 mem REG 3,2 22592 99819 /usr/lib/locale/LC_COLLATE
vim 2381 usr-4 mem REG 3,2 2457 20980 /usr/lib/locale/LC_TIME
vim 2381 usr-4 mem REG 3,2 60 35062 /usr/lib/locale/LC_NUMERIC
vim 2381 usr-4 mem REG 3,2 290511 64237 /usr/lib/libncurses.so.5.2
vim 2381 usr-4 mem REG 3,2 24565 64273 /usr/lib/libgpm.so.1.18.0
vim 2381 usr-4 mem REG 3,5 11728 144511 /lib/libdl-2.2.5.so
vim 2381 usr-4 mem REG 3,5 22645 144299 /lib/libcrypt-2.2.5.so
vim 2381 usr-4 mem REG 3,5 10982 144339 /lib/libutil-2.2.5.so
vim 2381 usr-4 mem REG 3,5 105945 144516 /lib/libpthread-0.9.so
vim 2381 usr-4 mem REG 3,5 169581 144512 /lib/libm-2.2.5.so
vim 2381 usr-4 mem REG 3,5 1344152 144297 /lib/libc-2.2.5.so
vim 2381 usr-4 mem REG 3,2 173680 112269 /usr/lib/locale/LC_CTYPE
vim 2381 usr-4 mem REG 3,5 42897 144321 /lib/libnss_files-2.2.5.so
vim 2381 usr-4 0u CHR 136,14 16 /dev/pts/14
vim 2381 usr-4 1u CHR 136,14 16 /dev/pts/14
vim 2381 usr-4 2u CHR 136,14 16 /dev/pts/14
vim 2381 usr-4 4u REG 3,8 12288 2621444 /n/.openfiles.html.swp
[root@srv-4 usr-4]# lsof | grep 2381 | wc -l
30
[root@srv-4 fd]# ls -l /proc/2381/fd/
total 0
lrwx------ 1 usr-4 usr-4 64 Jul 30 15:16 0 -> /dev/pts/14
lrwx------ 1 usr-4 usr-4 64 Jul 30 15:16 1 -> /dev/pts/14
lrwx------ 1 usr-4 usr-4 64 Jul 30 15:16 2 -> /dev/pts/14
lrwx------ 1 usr-4 usr-4 64 Jul 30 15:16 4 -> /n/.openfiles.html.swp

Quite a difference. This process has only four open file descriptors, but there are thirty open files associated with it. Some of the open files which are not using file descriptors: library files, the program itself (executable text), and so on as listed above. These files are accounted for elsewhere in the kernel data structures (cat /proc/PID/maps to see the libraries, for instance), but they are not using file descriptors and therefore do not exhaust the kernel's file descriptor maximum.

Linux Tune Network Stack (Buffers Size) To Increase Networking Performance

I have two servers located in two different data center. Both server deals with a lot of concurrent large file transfers. But network performance is very poor for large files and performance degradation take place with a large files. How do I tune TCP under Linux to solve this problem?

By default the Linux network stack is not configured for high speed large file transfer across WAN links. This is done to save memory resources. You can easily tune Linux network stack by increasing network buffers size for high-speed networks that connect server systems to handle more network packets.

The default maximum Linux TCP buffer sizes are way too small. TCP memory is calculated automatically based on system memory; you can find the actual values by typing the following commands:
$ cat /proc/sys/net/ipv4/tcp_mem

The default and maximum amount for the receive socket memory:
$ cat /proc/sys/net/core/rmem_default
$ cat /proc/sys/net/core/rmem_max

The default and maximum amount for the send socket memory:
$ cat /proc/sys/net/core/wmem_default
$ cat /proc/sys/net/core/wmem_max

The maximum amount of option memory buffers:
$ cat /proc/sys/net/core/optmem_max

Tune values

Set the max OS send buffer size (wmem) and receive buffer size (rmem) to 12 MB for queues on all protocols. In other words set the amount of memory that is allocated for each TCP socket when it is opened or created while transferring files:
WARNING! The default value of rmem_max and wmem_max is about 128 KB in most Linux distributions, which may be enough for a low-latency general purpose network
environment or for apps such as DNS / Web server. However, if the latency is large, the default size might be too small. Please note that the following
settings going to increase memory usage on your server.
# echo 'net.core.wmem_max=12582912' >> /etc/sysctl.conf
# echo 'net.core.rmem_max=12582912' >> /etc/sysctl.conf

You also need to set minimum size, initial size, and maximum size in bytes:
# echo 'net.ipv4.tcp_rmem= 10240 87380 12582912' >> /etc/sysctl.conf
# echo 'net.ipv4.tcp_wmem= 10240 87380 12582912' >> /etc/sysctl.conf

Turn on window scaling which can be an option to enlarge the transfer window:
# echo 'net.ipv4.tcp_window_scaling = 1' >> /etc/sysctl.conf

Enable timestamps as defined in RFC1323:
# echo 'sysctl net.ipv4.tcp_sack = 1' >> /etc/sysctl.conf

By default, TCP saves various connection metrics in the route cache when the connection closes, so that connections established in the near future can use these to set initial conditions. Usually, this increases overall performance, but may sometimes cause performance degradation. If set, TCP will not cache metrics on closing connections.
# echo 'net.ipv4.tcp_no_metrics_save = 1' >> /etc/sysctl.conf

Set maximum number of packets, queued on the INPUT side, when the interface receives packets faster than kernel can process them.
# echo 'net.core.netdev_max_backlog = 5000' >> /etc/sysctl.conf

Now reload the changes:
# sysctl -p

Use tcpdump to view changes for eth0:
# tcpdump -ni eth0

How do I set up remote logging?

Introduction

Remote logging allows you to send your logs from one machine to another. This enables you to keep a backup of your logs. You could also have all of your machines log to the same place. This would then give you one place to check your logs at.

In this lab, we will set up a machine as the logging server, and then we will direct our client to log to the log server.

1) For the log server, (we’ll use 192.168.0.1 as the example), we need to enable remote logging. Log on to the log server and edit /etc/sysconﬁg/syslog. Make sure that SYSLOGD_OPTIONS looks like:
SYSLOGD_OPTIONS="-r -m 0"

2) Once you have made the above change, restart syslogd on the log server:
# service syslog restart

3) On the client, edit /etc/syslog.conf. For any messages that you want to log to the log server, you will need to change the entries. For example, I have:
*.info;mail.none;authpriv.none;cron.none /var/log/messages

If I want these messages to log to the log server, I would change this line to:
*.info;mail.none;authpriv.none;cron.none @192.168.0.1

Replacing 192.168.0.1 with whatever your log server’s ip is.

4) Let’s also add a line to log user.* to our log server:
user.* @192.168.0.1

Restart syslogd on the clinet:
# service syslog restart

5) If you would like to test this setup, you can do the following:
logger -i -t user1 "Testing Logging"

At this point, you should see messages from your client in your server’s /var/log/messages.

How do I set up a Cron Job?

Introduction

In these two examples, we will set up cron jobs to output text to ﬁles.

Procedure 1

In this procedure, we will set up a cron job for the user joe that runs at 20 minutes past the hour every Wednesday in January.

1) To edit the crontab for joe, we ﬁrst need to be logged in as Joe. Once we are logged in as joe, we need to type the following:
# crontab -e

2) Now we need to type in the following cron job (Notice our use of full paths – this is -highly recommended with cron jobs):
20 * * 1 3 /bin/echo $(/bin/date) >> /home/joe/crondates

3) Save the ﬁle and exit the editor and that will set the cron job as active.

Procedure 2

In this procedure, we will set up a cron job that runs as root every hour and sends the date to /var/log/messages. This job will be set up in the /etc/cron.* structure as opposed to being setup through crontab.

1) As root, create the script /etc/cron.hourly/printdate:
#!/bin/bash
# A cron job to send the date to /var/log/messages
/bin/echo $(/bin/date) >> /var/log/messages

2) Set the script as executable:
chmod +x /etc/cron.hourly/printdate

3) At this point, the cron job is set.

Basic Command-line Date Tools

The date command can be used as follows to display the time and date:
$ date
Fri Mar 28 16:01:50 CST 2003

To see UTC/GMT, you can do this:
$ date --utc
Fri Mar 28 08:04:32 UTC 2003

The date command also can be used to set the time and date. To set the time manually, do this:
# date -s "16:15:00"
Fri Mar 28 16:15:00 CST 2003

If you also need to adjust the date, and not just the time, you can do it like this:
# date -s "16:55:30 July 7, 1986"
Mon Jul 7 16:55:30 PDT 1986

There is also another way to set the date and time, which is not very pretty:
# date 033121422003.55
Mon Mar 31 21:42:55 PST 2003

The above command does not use the -s option, and the fields are arranged like this: MMDDhhmmCCYY.ss where MM = month, DD = day, hh = hour, mm = minute, CCYY = 4 digit year, and ss = seconds.

Please note that setting the clock with the date command must be done as root. This is a "savage" way to adjust the time. It adjusts the Linux kernel system time.

There is also a hardware clock (CMOS clock). You can look at the current hardware clock time with:
hwclock --show

I always keep my hardware clocks set to UTC/GMT. This maintains my clocks uniformly without any worries about "Daylight Savings Time". This is important, because when you set the hardware clock from the system clock (kept by the Linux kernel), you need to know if this is the case. To set the hardware clock from the system clock, leaving the hardware clock in UTC, enter the following:
# hwclock --systohc --utc
# hwclock --show
Fri 28 Mar 2003 04:23:52 PM CST -0.864036 seconds

Another interesting item is that the Linux system clock stores time in seconds since midnight on January 1st, 1970 (UTC). This is called UNIX time. Unfortunately, because this is a 32-bit value, there is a year-2038 problem. Hopefully, everyone will have moved to 64-bit architectures by then. In order to see the UNIX time, you can use the following command:
# date +%s

There are many useful formatting options for the date command. See the date manpage for details.

Of course, there is another useful tool available related to date and time: cal

$ cal -3
February 2003 March 2003 April 2003
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 1 1 2 3 4 5
2 3 4 5 6 7 8 2 3 4 5 6 7 8 6 7 8 9 10 11 12
9 10 11 12 13 14 15 9 10 11 12 13 14 15 13 14 15 16 17 18 19
16 17 18 19 20 21 22 16 17 18 19 20 21 22 20 21 22 23 24 25 26
23 24 25 26 27 28 23 24 25 26 27 28 29 27 28 29 30
30 31

You can also specify "cal -y" for the entire year, "cal" by itself for the current month, or "cal 12 2005" to see the calendar for December, 2005.

[edit]
Time Zone Configuration:

Background - The Earth is divided into time zones that are 15 degrees of longitude each, for this corresponds to the amount of angular distance the Sun appears to travel in 1 hour. 0 degrees longitude runs through the Royal Observatory in Greenwich, England. This is the origin of Greenwich Mean Time, or GMT. For all practical purposes, GMT and UTC are the same. To complicate matters, some countries observe Daylight Savings Time (DST), while others do not. Even within some countries, some states or districts do not observe DST while the rest of the country does! DST can also begin and end on different days in different countries! What a mess...

There are several files and directories that are used for time zones, and several tools:
/etc/sysconfig/clock - this is a short text file that defines the timezone, whether or not the hardware clock is using UTC,
and an ARC option that is only relevant to DEC systems.
/etc/localtime - this is a symbolic link to the appropriate time zone file in /usr/share/zoneinfo
/usr/share/zoneinfo - this directory contains the time zone files that were compiled by zic. These are binary files and cannot be
viewed with a text viewer. The files contain information such as rules about DST. They allow the kernel to convert UTC UNIX time
into appropriate local dates and times.
/etc/rc.d/rc.sysinit - This script runs once, at boot time. A section of this script sets the system time from the hardware clock and applies
the local time zone information.
/etc/init.d/halt - This script runs during system shutdown. A section of this script synchronizes the hardware clock from the system clock.
/etc/adjtime - This file is used by the adjtimex function, which can smoothly adjust system time while the system runs.
settimeofday is a related function.

redhat-config-date or dateconfig - These commands start the Red Hat date/time/time zone configuration GUI. Both commands failed to change the timezone in two different stock Red Hat 8.0 systems. They also failed to create a working ntp.conf file for the NTP server. The timezone problem went away after upgrading from the installed RPM, redhat-config-date-1.5.2-10, to a newer RPM from a Red Hat beta release, redhat-config-date-1.5.9-6.
zic - (The time zone compiler) Zic creates the time conversion information files.
zdump - This utility prints the current time and date in the specified time zone. Example:

# zdump Japan
Japan Sat Mar 29 00:47:57 2003 JST

# zdump Iceland
Iceland Fri Mar 28 15:48:02 2003 GMT

# zdump /usr/share/zoneinfo/Asia/Calcutta
/usr/share/zoneinfo/Asia/Calcutta Thu May 21 13:27:25 2009 IST

In order to manually change the timezone, you can edit the /etc/sysconfig/clock file and then make a new soft link to /etc/localtime. Here is an example of changing the timezone manually to "America/Denver":

1. Select the appropriate time zone from the /usr/share/zoneinfo directory. Time zone names are relative to that directory. In this case, we will select "America/Denver"

2. Edit the /etc/sysconfig/clock text file so that it looks like this:
ZONE="America/Denver"
UTC=true
ARC=false

Of course, this assumes that your hardware clock is running UTC time...

3. Delete the following file: /etc/localtime

4. Create a new soft link for /etc/localtime. Here is an example of step 3 and step 4:
# cd /etc
# ls -al localtime
lrwxrwxrwx 1 root root 39 Mar 28 07:00 localtime -> /usr/share/zoneinfo/America/Los_Angeles
# rm /etc/localtime
# ln -s /usr/share/zoneinfo/America/Denver /etc/localtime
# ls -al localtime
lrwxrwxrwx 1 root root 34 Mar 28 08:59 localtime -> /usr/share/zoneinfo/America/Denver
# date
Fri Mar 28 09:00:04 MST 2003

The chroot command

In each of the boot problems, you will need to boot from some kind of rescue media, then work at the command line to repair the damage. If you boot from the Red Hat installation CD in rescue mode, you will need to change the root directory so that the various system directories and filesystems are in the correct locations:
chroot /mnt/sysimage

The chroot command is extremely useful for both system security and for system repair. Its basic syntax is:
chroot new-root-dir [command ...]

and its purpose is to run the specified command with the root directory changed to new-root-dir. If no command is specified, the default behaiour is to run an interactive shell (usually a bash shell). For example, the command:
chroot /var/ftp

will run a command shell in /var/ftp. However, note that the behaviour is to change the root directory first, and then try to invoke the command or shell, so that there had better be a file /var/ftp/bin/bash (which there would be, on many systems). In addition, the command will usually need to be statically linked, as otherwise it would attempt to load libraries from /lib, which is now /var/ftp/lib.

The chroot command is often used to start network daemons on servers - this is so that if an attacker manages to compromise the daemon, perhaps through a buffer overflow, he is unable to navigate around the entire system directory tree, but is instead constrained within a 'chroot jail'.

A major use of the chroot command is to change the root directory of the system after booting from a repair floppy or CD. For example, if you boot a Red Hat installation CD with the command 'linux rescue', the root file system is actually a RAM disk, and the root filesystem on your hard drive is mounted as /mnt/sysimage. Commands you give will load programs from /bin and /sbin on the RAM disk, which is obviously limited. To get access to those directories on the hard drive, you will need to change your root directory with the command
chroot /mnt/sysimage

How do I install GRUB to the MBR?

Introduction

This exercise will show you how to install GRUB to the MBR assuming that the GRUB software is installed on your computer. If GRUB is not on your computer for some reason (rpm -q grub), then you will need to install the appropriate RPM for GRUB. However, be aware that GRUB is included in the default RHEL install. If you need to install the GRUB rpm, this could be a sign of bigger problems.

Procedure

1) Our boot partition is /dev/sda1. We want to install GRUB to the master boot record of /dev/sda. This is because our BIOS passes control to /dev/sda once it has done its work. In order to do this, we need to ﬁrst determine the GRUB device name for /dev/sda by looking in /boot/grub/device.map:
(fd0) /dev/fd0
(hd0) /dev/sda
(hd1) /dev/sdb

We can tell that our GRUB device is (hd0).

2) We now need to enter the GRUB shell. To do this, we simply type grub while logged in as root:
[root@station1 ~]# grub
Probing devices to guess BIOS drives. This may take a long time.

3) After we are presented with a GRUB prompt (grub> ), we will need to specify where the boot partition is with the following command:
grub> root (hd0,0)

Notice how we used (hd0,0). In GRUB language, this identiﬁes /dev/sda1.

4) We now need to tell GRUB which disk’s master boot record it needs to install to. This is done by typing the following command:
grub> setup (hd0)

This says that we want GRUB to install to the master boot record of /dev/sda.

5) To get out of the GRUB shell, we simply type:
grub> quit

6) At this point, GRUB is now installed on /dev/sda and knows that the boot partition is /dev/sda1. It is safe to reboot.

Linux Kernelogic

Search This Blog