[Federated-fs] Conf call 6/5/2008

Wed Jun 11 23:04:51 PDT 2008

Dan,
            I think we had agreed that we do not mandate a common 
namespace but we also do not mandate that a common namespace cannot be 
created, My point (I) was just echoing Paul's setup so that if we wanted 
to have a common namespace with a root fileset etc. it should be possible 
to do so. Lot of customers want a common namespace to manage their 
multi-vendor fileservers distributed across multiple office locations 
using a common management console etc. A number of  startups selling FAN 
solutions have the same value proposition. So fed-fs, atleast the way I 
interpreted it, was an inclusive protocol. You can choose to set up  a 
common namespace or you could choose to have independently administered 
ones. Neither is mandated and either should work. 

I think we had agreed that we will punt on figuring out the replication 
protocol in the first version of fed-fs. Which is why the reliance on a 
replication protocol for junction copying was bothersome. If we define the 
replication protocol then we can ensure it copies junctions. Say,  I want 
to use rsync in the first version then I cannot get the junction 
information if it is stored in the exports entry. So using fed-fs v1.0 
with rsync will not work.  So now I need a replication protocol and soon 
we are going in circles...

I think we can discuss in the call about storing junction information in 
the NSDB.  I think (without paths) should not be much of an issue. 

The motivation is to have sufficient information in the NSDBs to present a 
consolidated view to file and storage management tools and data migration 
tools that need to know the way namespace is constructed using the 
filesets without having to contact hundreds and thousands of fileservers. 

On your point#3 about  multiple mounts of the same fileset this would just 
result in multiple junction entries with different parents.  <parent_fsn1, 
target_fsn1, handle1> , <parent_fsn2, target_fsn1, handle2> ...

So there is a lot to talk about tomorrow and  we can bet it is not going 
to be a dull call :)

regards
renu

Renu --

    I am still missing some information that I need to understand your
concerns.  As I mentioned in the conference call last week, I would like 
to
see use cases that motivate the changes that you suggest.

It seems to me that some of the changes you propose break or weaken some 
of
the current features of the protocol (and contradict the requirements 
doc).
Of course there are always tradeoffs in any protocol, but I want to
understand what I'm getting in return for what I'm giving up!

Please provide some example of operations that are both compelling, can be
done with your system, and can't be done with the present system.  For
example, let's begin with global knowledge of where all the junctions are 
in
the namespace -- I haven't figured out why anyone other than servers
implementing a fileset would need this information.  I'm willing to be
convinced, but I need some help.  (I'm sure that there are uses that seem
obvious to you but are new to me.)

One philosophical disconnect is that your scheme distributes the 
information
that defines the namespace in several different records (on multiple
servers) per object, whereas the current scheme minimizes this.  I don't
know what happens when the different records of information about an 
object
become inconsistent (does the system collapse, or is it just a soft 
error?)
but I do know, because there's no way to do multi-server transactions in
LDAP, that we're going to have to deal with it.

I think the general disconnect may be around the notion of federation.  I
want different admins to be able to manage their parts of the system.  I
don't want to make it necessary for there to be a central admin that can
control all aspects of the namespace, which seems to be implied by your
first point.  (It's fine if there happens to be such an admin, but I won't
accept that there *has* to be one -- that's simply not a federation!)

Now, to drill down to the your major points:

On 6/11/08 5:45 PM, "Renu Tewari" <tewarir at us.ibm.com> wrote:

> This leads to point (II) which was the discussion we had last week.
> In the current state of the schema the junction information represented 
by
> the tuple < FSN_PARENT, FSN_TARGET, PARENT_PATH >  is not stored 
anywhere.
> It is neither in the NSDB nor at the fileserver. The fileserver  only 
has
> some state marking a given directory in the parent fileset to point to 
the
> target FSN or however else it implements junctions. The fileserver knows
> nothing about filesets.
> So my main concerns are:
> --- Using the current schema the junctions are not  really managed by 
the
> fed-fs protocol as you cannot move the junction or delete it or query it 
or
> know where it exists.  It depends solely on how the fileserver 
implements
> junctions and what it can tell an admin about it.

Why don't we define an interface for what the fileserver must be able to
tell the admin, and how?

Of course every server is going to implement junctions in its own way.  We
need a veneer that allows us to manipulate this in a uniform manner -- I
don't see any disagreement here.

Saying that this is all done on the NSDB, however, really ties the hands 
of
the implementer and forces them to do it one way.

> ---The relationship between filesets is not available to an admin 
through
> the NSDB so it has no idea how the namespace is organized.

I really need an example here.  What question is the admin going to ask?

> 
> --- When an FSL is created we need to rely on an external replication
> protocol that will also replicate the junction information. If this
> protocol is not a part of fed-fs  we cannot rely on it to work cross
> vendor.  To depend on this undefined replication protocol  for the basic
> functioning of the federation bothers me.

Well, if we don't have a replication protocol, then we can't 
cross-platform
replication of ordinary volumes anyway, much less filesets containing
junctions.  So the presence of junctions here is a red herring -- the real
problem is cross-platform replication!

In the current reqts doc, the consensus was that we could let this go for
the first version of the protocol, because we can use cluster replication
and same-vendor replication to get a first approximation of what we need.
In the longer term, we need to tackle this.

Luckily, there has been a lot of work on replication protocols, so I am
confident we'll be able to find one that makes everyone happy.  In the
meanwhile, I think we should press ahead with the namespace work.  Putting
the namespace work on hold while we thresh through replication protocols
will add a lot of time to the schedule, and it's not the first priority of
our customers.

> 
> As I understand from Dan's comments  he had 3 issues on storing the
> junction information in the NSDB.  #1 was related to storing the
> parent_path information of the junction in the NSDB. The main issue 
being
> what happens on a rename of any of the path components.
> One way is not to store the path information but only < FSN_PARENT,
> FSN_TARGET> in the NSDB. This will let an admin know how the namespace 
is
> laid out in terms of filesets but not paths.  This still leaves open how 
to
> delete a junction. Another way to do it have the fileserver return an
> opaque handle that identifies the junction location instead of path. The
> NSDB can store it as the tuple
>  < FSN_PARENT, FSN_TARGET, opaque_handle> and use it to manage the
> junction. This would require the fileserver participation. Other 
possible
> ideas..

An opaque handle rather than a path solves a lot of my issues, by 
decoupling
the junction object from changes in the namespace.  But I'm not sure it
solves your problem -- is knowing that fileset X is a descendent of 
fileset
Y (without knowing where Y is in the namespace) good enough for what you
want?  If so, then that really helps simplify things -- reducing the 
number
of potential updates and inconsistencies, and the number of things.

> 
> Dan's #2 was a non issue. The target FSN's fileserver knows nothing 
about
> where it hanging from in the namespace.
> For #3  I am not sure why this relates to storing junction information 
in
> the NSDB. It is related to  how the export path is related to the common
> namespace. Maybe I did not get that issue.

Let me try another way...  Imagine that you have one fileset that appears 
in
two places in the namespace.  What does the resulting set of NSDB entries
look like?

> The problem of storing paths also exists in storing FSL paths  in the 
NSDB.
> Am I mising something there?

I don't think we're going to get into a problem because those paths are
always interpreted in the context of a server.

[an aside -- we keep getting hung up on the difference between 
server-local,
pseudo-root, and namespace paths...  I wish we could come up with better
names, maybe "spath" for server-local path, "ppath" for pseudo-root paths,
and "cpath" for client-visible paths?]

-Dan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/federated-fs/attachments/20080611/3b98f094/attachment.html