REGISTER  


FYI: Root Filesystem on a Samba Share
This thread belongs to expert.forumgeeks.net


2009-03-24 21:36 GMT   |   #1
 
Previously, I had seen that it is possible to have Linux use a NFS mount
point as a root filesystem, as explained in this unmantained mini-howto:
http://www.faqs.org/docs/Linux-mini/NFS-Root.html

There are kernel arguments which help to setup the network interface and
prepare the system to boot from NFS.

Samba shares offers similar functionality as NFS, so I was curious
whether it would be possible to mount a sambe share (using mount.cifs) as
part of the initial boot process, and use that as a root filesystem. I was
pleasantly surprised, and I can report that it works! One possible use for
this setup is to build diskless workstations which utilize centralized
storage devices, etc. AFAICT, there is no special arguments required to
boot the kernel. (At least, with this manual, hacked together method, it
just works.)

Here is a broad outline of the steps I followed:
0. Setup a generic Slackware 12.2 on a "real" hard disk. Verify
that it boots, etc. Shutdown and transfer that root filesystem to file on
the Samba server which will host the root filesystem *.

1. Prepare an initial ramdisk working environment with the kernel modules
and toolset that you need to boot and finalize the links to your network
based root filesystem. I didn't optimimize or cut back my working
environment too much, and it requires 132MB space once the initrd is
decompressed. The workstation where I tested has 256MB RAM, and luckily it
didn't complain about low memory while preparing to switch root
filesystems. However, it could be that amount of RAM is cutting it
close for this test. My environment is somewhat different than the
standard Slackware initrd, and requires some environment variables to be
set, such as ROOT_DEV, ROOT_FS, etc.

3. Prepare a boot media which contains your kernel and the initial
ramdisk you built in step 1. You'll still need some way to jump start the
boot process. Possibilities: cdrom, usb-flash, or some network boot for a
total network solution.

4. Boot the test media. After loading the kernel and initrd, my startup
environment is set to give the user at a simple bash prompt. (This is a
fairly complete environment - it uses 132MB, afterall). The following
steps are performed manually from this environment:
a. modprobe the network module required for the interface connected to
the Samba share. I am using the r8169 module on gigabit ethernet.

modprobe r8169

b. manually assign an IP address to that interface. I just used ifconfig.

ifconfig eth0 192.168.254.20

c. mount the Samba share (mount.cifs)

mount.cifs //192.168.254.30/test /tmpfs/rw_dev

d. setup a loopback device pointing to the appropriate file on the Samba
share.

losetup /dev/loop0 /tmpfs/rw_dev/slack12.2-testbox.8g-xfs.img

e. Inform the initrd that the

echo ROOT_DEV=/dev/loop0 >>user.inp
echo ROOT_FS=xfs >>user.inp

f. Return control to the initrd,

exit

g. Voila! Bootup continues using the network share as a mount point.

I was pretty amazed this actually worked, especially from this rough
hacked (hewn?) working environment. I haven't tested it extensively, but
so far so good.

* From a design point of view, I want to use a loopback file on the Samba
share because it will properly encapsulate the unix permission model
among other things. First, I want to avoid problems with permissions
which could be introduced by Samba. By using a loopback file, I have
also minimized what Samba is required to do (i.e. from its perspective
there is only one file open, not several hundred, etc.) The loopback file
acting as the root filesystem simply means that everything should "just
work" normally.

p.s. I hope I didn't make too many mistakes describing the process.

--
Douglas Mayne
2009-03-25 03:20 GMT   |   #2
 
I was reading, waiting to see what happened to permissions, all sorted,
quite clever Smile Apart from the samba share coming from a 'doze box Wink

Not that I can think of a use for it here, but a concept worth knowing.
2009-03-27 15:09 GMT   |   #3
 
To follow up on my own post...I used the system with a network based
disk for a day or so, and if worked just fine. I have included some
rough benchmarks below of the "aggregated" network+disk performance.

In my haste to check if this worked at all, I did overlook (at least) one
item, though. The standard Slackware shutdown script, rc.6, did not
/* automatically */ do the right thing. I hadn't really looked at it, but
this code block checks if the system is mounted on NFS to avoid
terminating the network interface too early:

<begin snippet from /etc/rc.d/rc.6>
# Bring down the networking system, but first make sure that this
# isn't a diskless client with the / partition mounted via NFS:
if ! /bin/mount | /bin/grep -q 'on / type nfs' ; then
if [ -x /etc/rc.d/rc.inet1 ]; then
. /etc/rc.d/rc.inet1 stop
fi
fi
</end snippet>

My system doesn't match that test- it reports this as mounted:
/dev/loop0 on / type xfs (rw,noquota)

I tried again with "rc.inet1 stop" removed, and it shutdown /* almost */
normally. The only exception is that the file locks on the Samba server
were not released. I am not sure how to do this- does anyone have any
ideas for doing this? I am guessing that transferring to a "shutdown
ramdisk" is necessary to make sure that the samba filelocks are properly
released. Probably, the same environment that is used as the initial
ramdisk should be reloaded. I'll probably play around with it some.

Benchmarks...
First of all, the Samba share is located on a hard disk that is capable
of reading at about 40MB/s (according to hdparm -t). The write speed is
about 25 to 30MB/s. The upper limit for gigabit ethernet is 125 MB/s, and
I am not sure what bottlenecks TCP/IP and Samba introduce, and they
could be significant. However, as I said, these are "rough" benchmarks,
so here's what I see when testing with the system booted with
its root filesystem on a file on a network share.

"Network+disk" Write Test:
$ dd if=/dev/hdc2 of=/root/tf bs=512 count=200000
200000+0 records in
200000+0 records out
102400000 bytes (102 MB) copied, 6.84559 s, 15.0 MB/s

Performance is not stellar, but usable.

"Network+disk" Read Test 1:
$ dd if=/root/tf of=/dev/null bs=512 count=200000
200000+0 records in
200000+0 records out
102400000 bytes (102 MB) copied, 3.0401 s, 33.7 MB/s

Read Test 2:
$ dd if=/root/tf of=/dev/null bs=512 count=200000
200000+0 records in
200000+0 records out
102400000 bytes (102 MB) copied, 0.607026 s, 169 MB/s

Read Test 3:
$ dd if=/root/tf of=/dev/null bs=512 count=200000
200000+0 records in
200000+0 records out
102400000 bytes (102 MB) copied, 0.614199 s, 167 MB/s

There seems to be a significant benefit added by caching after the first
network transfer. The status report below shows memory allocated (Cached):

$ cat /proc/meminfo
MemTotal: 254736 kB
MemFree: 8240 kB
Buffers: 0 kB
Cached: 220896 kB
SwapCached: 0 kB
Active: 129216 kB
Inactive: 98212 kB
Active(anon): 2092 kB
Inactive(anon): 4516 kB
Active(file): 127124 kB
Inactive(file): 93696 kB
Unevictable: 0 kB
Mlocked: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 254736 kB
LowFree: 8240 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 8 kB
Writeback: 0 kB
AnonPages: 6568 kB
Mapped: 6524 kB
Slab: 13168 kB
SReclaimable: 7072 kB
SUnreclaim: 6096 kB
PageTables: 400 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 127368 kB
Committed_AS: 15688 kB
VmallocTotal: 770104 kB
VmallocUsed: 2936 kB
VmallocChunk: 754896 kB
DirectMap4k: 12224 kB
DirectMap4M: 249856 kB

And for completeness, the test system specs are no where near state
of the art. The main components are listed below:

CPU: Celeron 1.3
Memory: 256 MB (PC100 SDRAM)
Ethernet Card: Netgear GA311 1000Mb (assigned to storage network)
Ethernet Card: Netgear FA311 100Mb (on typical LAN)

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Celeron(TM) CPU 1300MHz
stepping : 4
cpu MHz : 1299.998
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse up
bogomips : 2599.99
clflush size : 32
power management:
2009-03-27 19:20 GMT   |   #4
 
While that is interesting academically, it's a scary thing to do, and I
seriously hope you only do it because you can and not because you want
to use it. AFAIK SMB/CIFS really is a bottleneck, and that's not taking
into account all that windows network negotiation stuff that goes on
behind the scenes. In any case, you should never have a root file system
that doesn't support unix ownership and permissions. A samba server
allows for that kind of thing, but that's a vendor specific extension,
so to say. Still, pretty impressing stuff.
2009-03-27 19:20 GMT   |   #5
 
It incorporate the unix permissions. See my first post, for a better
explanation; notice that the limitations which would be imposed by
Samba are avoided by mounting a "container" file. The container file is
an 8G file which has a loopback device assigned to it (/dev/loop0). The
root filesystem is _inside_ this container- in this case, xfs. Samba is
only tracking permissions of the container itself. That ensures that only
one remote machine can obtain read-write access to it at any given
time. At first glance, that seems like a viable locking strategy for disk
acess over TCP/IP. This is my first stab at an unsophisticated poor-man's
SAN, and without spending a lot of money for a test platform. BTW, I rate
the danger fairly low, as long as the cables stay connected Wink I am
interested in relative benchmarks, though. I am especially curious what
performance is achieved for other storage solutions which use TCP/IP as a
transport.

? I'm not sure what "vendor" you are talking about. Samba is an GNU
licensed project, and part of Slackware.

Allow me to elaborate about "why" I find this of interest: First, discs
fail. RAID can protect against hardware failure, but that gets expensive
if every system on the network must incorpate RAID storage. There
are certainly other* (and maybe better approaches) to this problem,
but as my simple tests show it is very close to just working "out of the
box." No tricks with ATA over ethernet, or iSCSI are required. Plus, it
has the benefit of using he fault tolerance which is built into TCP/IP.

* Another approach to avoiding points of failure is to virtualize
servers whenever possible. This approach generally requires a beefy
central server with the RAM and CPU to take over the work of several
machines. IIUC, for failover, two beefy systems are tracking the same
server set. In my case, this is overkill. I may be happy if I could just
consolidate some server storage. Moore's Law keeps redefining the playing
field, especially with disk storage. Say "hello" to 2TB per disk soon!
2009-03-30 16:02 GMT   |   #6
 
I tested this some more to see if a new root filesystem could be
loaded and the existing root filesystem unmounted. It hasn't worked
as of yet, and the samba filelocks are not released at the server.
AFAICT, this didn't work because "init" (pid 1) is still executing. The
shutdown script, rc.6, must be executed as a child process of
init, and doesn't directly replace it (via exec). I guess this is because
the termination process can be cancelled, as in the case of a power
failure. Slackware 12.2's "init" comes from the package

slackware/a/sysvinit-2.86-i486-6.tgz

I haven't looked at the source, but it appears to me that init needs to be
modified to include an option where it will reliquinsh control
completely. I suppose, this could be added as a "telinit" option. I want
to tell init that it should do the following:

1. Go to a single user environment and kill all child processes.
2. Load a new working environment into RAM which will serve as the "final"
ramdisk.
3. Use pivot_root to switch to that environment, while keeping the
existing root filesystem mounted but move to /old_root.
4. Transfer control to a program on the final ramdisk. This program is
responsible for performing operations which will properly close out the
root filesystem.

It seems to me that init is missing a necessary feature. It
should be willing to relinquinsh control at the "end of its life" to
ensure proper cleanup and shutdown opertaions are performed as necessary.
I have prepared this annoted graphic that illustrates the change that I
propose:
http://www.xmission.com/~ddmayne2/misc/ss.2009-03-30.01.png

Hopefully, someone will see this who can offer more advice on how to fix
this. In googling this topic, I noticed that others have had similar
problems, and the "solution" was to just ignore the errors that crop up.
That is not an ideal solution, IMO.