[Federated-fs] Comments on the 9/8/07 protocol draft

Mon Sep 10 11:19:24 PDT 2007

First, kudos to Renu for making a lot of progress in the last two weeks!

Comments on the new stuff:

- There are several error codes that are defined, but not used anywhere.
I
don't know if this means that they're going to be used later, maybe they
just got sucked in from the v4 spec.  We should either define the
constants
here or spec where they can be found.  (I don't want ERR_PERM to be one
value in v4 and another here -- that'll make coding a nightmare.)

- The params and return values for the functions look like XDR, but
they're
not.  This is probably OK at this stage, but more description might be
helpful about what the various structs and fields are for.  We don't
need to know how the bits are arranged right now (but if the info is
available, then it doesn't hurt to be specific).

Section 2.2, "Inter-Fileset Dependency"

Given the last sentence in this section ("The federation protocol does
not control or manage the relationship among filesets.") I don't see the
necessity of this subsection at all.  If it's not part of the protocol,
let's not discuss it here.

If we do discuss this, or add this to the protocol (I will argue against
either) then we need to be very clear with the definitions.  As
currently defined, a fileset can't have a replica, because a fileset is
meant to be an abstraction -- so if you replicated a fileset, you get
the same fileset.

Section 2.3:

Need to distinguish between an NSDB location and the NSDB as a service.
An admin might manage a local NSDB location (or instance) but the "NSDB"
as a whole is part of the federation service and nobody manages it.

Section 2.4:

I don't know whether the fact that "a mount point is a directory" is
just baggage from NFS and our current implementations or whether there
really needs to be a directory there.

In the second to last paragraph, a specific procedure for doing the
lookup is given.  I think this is a fine example to do resolution, but
right now it's written as if it's the only way to do resolution, and I
don't agree with that.  I think we should focus on the interfaces, not
on how the server actually does what needs to do to respond to a
specific query.

Section 2.6:

Might want to mention here that a client can mount as many filesets as
they like (and this is essential to the protocol).  In effect, they can
create their own root, if they so desire and have access to the proper
info.  The root servers are a luxury, but not a requirement.

I'm VERY uneasy about requiring that the root fileset be replicated
without saying how this replication is done.  To recap the long
discussion we had during the generation of the reqts doc (for the sake
of readers who have joined us since then) about what kind of replication
we do and do not assume:  I believe that the consensus was that the
federated-fs protocol should make it easy to keep track of where
replicas are, but not to actually take part in creating or maintaining
these replicas.  Replica management is handled by other protocols.

As an example, imagine that you have two clusters that you want to
federate, one running system X, and the other running system Y.  Members
of the cluster running X know how to manage replicas between themselves,
as do members of the cluster running Y, but neither system X or Y know
how to replicate between each other.   (just pick your favorite two
systems for X and Y, and this statement is probably true...)  So if the
root filesets either must be running mutually replica-compatible systems
(i.e., they're all in the same cluster), or else we need a protocol for
replicating the root fileset.

If we constrain the root fileset to be something very simple (i.e. like
the top-level of the AFS world) then this seems doable.  If the root
fileset is, as described here, a general fileset that can contain files
and directories as well as junctions (and probably links and who knows
what else) then we've just made our job a LOT harder.

Again, since the root fileset is optional, let's discuss how much it
needs to be spec'd here at all.  Maybe this problem doesn't have to be
solved right now.

Sections 4 and 5:

There's an implicit assumption here (I think) that the methods/protocols
for finding the root servers and the local NSDB are part of the
federated-fs are part of the protocol.  Is that the intent?  If so,
let's be explicit.

I'm fuzzy on why the local NSDB local needs to be found, or why it would
be found in a different way than finding remote NSDB locations.  (I
don't think it's required to have a local NSDB location at all, although
there might be advantages to having a local location.)  If it's up to
the local impl, then let's leave it out of the protocol.

Finding remote NSDB locations MUST be part of the protocol, however.
There has to be a clear and guaranteed way to find the NSDB for an FSN. 

Section 6:

This is very helpful.  Diagrams might be even more helpful.