[Federated-fs] Root fileset discussion summary

Thu Jul 3 10:58:36 PDT 2008

Here is a summary based on some of the mail discussions that Paul and I
have had on root filesets.
Paul, I have rephrased some of your points so do let me know if I have not
summarized them correctly.

regards
Renu

Definitions
1. The top level of the namespace is called the root fileset.
2. The root fileset is represented by an FSN. There could be a special flag
that labels is as the root FSN.
3. The root fileset does not contain any file data. It consists of a
hierarchy of directory nodes. The leaves of the root fileset are junctions
whose targets are FSNs that have a physical representation on  some
fileserver.

Open Questions

 1a. Are the root fileset details stored in the NSDB and instantiated on
the fileservers that export the root fileset? This would include storing
the path information that leads from the root directory node to the leaf
junction. For example either by storing explicitly the tuple instantiating
each path from root to a leaf <root_fsn,
target_fsn, /path/from/root/leaf1>... Or creating a hierarchy of nodes
where each node defines a parent. The path will be created by walking up
from a leaf to the root.

 1b. Alternately does the NSDB store just the definition of the root FSN
and the junction information i.e., just the tuples <root_fsn, target_fsn,
opaque_handle>. The structure (directory tree) of the root fileset is
stored within the fileservers exporting the root fileset. The fed-fs
protocol that creates junctions will be used to instantiate the paths and
junctions at the root fileserver. A replication protocol can replicate the
root fileset across multiple fileservers. One of the root fileservers will
have to be designated as the master. The master fileserver will be the
authority on the layout of the root fileset.

 2a.  How many NSDBs store the root fileset information.  All the NSDBs or
only on some of them that manage fileservers that will be exporting the
root of the namespace.
 2b.  If multiple NSDBs store the root FSN information how are the multiple
NSDBs kept in sync. Which NSDB plays the role of master.

3. If the NSDB stored the details of the root fileset how do the root
fileservers reflect the top level of the namespace to the NFS clients.   Is
it a pull model or push. Currently we do not
have a push from the NSDB supported.

4.  What is the layout of the top of the namespace. Can be it be shallow
just a one-level deep tree with a huge fanout or we desire  a multi-level
tree with a moderate fanout at each level or anything in between. Can we
use an LDAP replication protocol and make one NSDB the master.

Summarizing some thoughts from Paul:
If the namespace itself is created within a file server, managed by a
namespace tool (say the fedfs admin), I could envision it working this way:

    * fed-fs admin owns the root file set on a file server.
    * fed-fs admin manipulates the root file set, requiring an API
to create junctions.
    * Replication of the root file set can be done by file server
replication OR by fed-fs being aware of root file set
servers, and updating each server when changes are made.
    * Multiple namespaces can be handled by manipulating multiple file
sets on servers.
    * File set meta-data is all in the NSDB, including logical
properties and location history.
    * fed-fs treats the root file set as the authoritative
copy, OR could treat the file server as an implementation point for
the namespace. The management tool could store the entire namespace
description in the NSDB in a now unspecified, non-standard format.

So, I guess it's really a question of do you want a set of junctions
on disk to be authoritative, or do we want to standardize a file set
description in a database as authoritative?

One use case I've always had was integration of backup applications
with the namespace. For example, given a logical path, locate the
physical location for that data at a given point in time.