[Federated-fs] Conf call 6/5/2008

Everhart, Craig Craig.Everhart at netapp.com
Thu Jun 12 08:57:34 PDT 2008


Personally, I think that (I) applies.  I think that the interesting
questions are how multiple fileservers inter-relate that are part of the
same organization but that are associated with different NSDBs.  I see
the inter-organizational relationship as a different target that will
maybe use the same solution as the intra-organizational relationships,
but that it doesn't have to.

If I interpret Renu's language of "root fileset" as "root fileset for an
organization", this all makes sense to me.

I'm personally waving what I think is a pragmatic solution to the
inter-organizational root via DNS, a prior solution.

What comes to mind with the rest of Renu's note is the question of
whether the NSDB (or something at its level) contains any file systems
at all.  In her example, 

> /fedfs/com/div/A
> /fedfs/com/div/B
> /fedfs/com/div/C
> 
> Here only A,B,C are junction points.

I don't know what she meant by "com/div" so let me map it to:

> /fedfs/x.com/div/A
> /fedfs/x.com/div/B
> /fedfs/x.com/div/C
> 
> Here only A,B,C are junction points.

The question is, is there a fileserver-served file system at
/fedfs/x.com/div, or is that served solely by the x.com's organizational
NSDB (or something like it)?  That is, should there be something besides
a file server that serves file system pathnames?

Let me be clear, following on (I).  I actually believe that an
organization will need some management tool and mechanism to traverse
the filesets belonging to multiple NSDBs.  Until very recently, we
haven't discussed one.  To me, Rob Thurlow (and the follow-on
discussion) makes a good starting stab toward such a thing.  But it
doesn't have to be the way that the tippy-top level is done, and I think
that relying on smooth inter-organizational cooperation for that level
is going to be a hard path.

		Craig


> -----Original Message-----
> From: Renu Tewari [mailto:tewarir at us.ibm.com] 
> Sent: Wednesday, June 11, 2008 5:46 PM
> To: Paul Lemahieu
> Cc: Everhart, Craig; Ellard, Daniel; federated-fs at sdsc.edu; 
> federated-fs-bounces at sdsc.edu
> Subject: Re: [Federated-fs] Conf call 6/5/2008
> 
> So we are raising a couple of important questions in this discussion.
> I)  Does the fed-fs protocol enable the management of a 
> common namespace across a set of  distributed multi-vendor 
> fileservers?
> II) Does the NSDB maintain state related to the namespace 
> especially junction information?
> 
> If we are not supporting (I) then we need to step back and 
> state what the goal of  fed-fs really is. There is not much 
> value add in going through with a new protocol that doesn't  do much.
> 
> Related to (I) and the point that Paul raised we had this 
> section on a "root fileset" which is the top of the common 
> namespace. (We had removed the root fileset and finding root 
> fileserver  stuff from the  draft  for the initial version 
> just to make it simple for starters  so it may need to be revisited).
> This root fileset information can be stored at one NSDB (in 
> the degenerate
> case) or all NSDB's that wish to be part of the common namespace.
> The FSLs that  instantiate the root fileset are located at 
> the root fileservers. ONLY the leaves of the root fileset are 
> junction points where the target FSNs map to FSLs that are 
> physically located at some fileserver.
> The path in the root fileset are logical and are used to 
> organize the top level of the tree.  There are no junctions 
> at the intermediate points  in the root fileset.
> 
> For example
> /fedfs/com/div/A
> /fedfs/com/div/B
> /fedfs/com/div/C
> 
> Here only A,B,C are junction points.
> The junction is the tuple < FSN_PARENT=ROOTFSN,  
> FSN_TARGET=FSN_A, PARENT_PATH=fedfs/com/div/A > The FSN_A can 
> in turn can contain a junction down the tree. This is not 
> part of the root fileset.
> 
> This leads to point (II) which was the discussion we had last week.
> In the current state of the schema the junction information 
> represented by the tuple < FSN_PARENT, FSN_TARGET, 
> PARENT_PATH >  is not stored anywhere.
> It is neither in the NSDB nor at the fileserver. The 
> fileserver  only has some state marking a given directory in 
> the parent fileset to point to the target FSN or however else 
> it implements junctions. The fileserver knows nothing about filesets.
> So my main concerns are:
> --- Using the current schema the junctions are not  really 
> managed by the fed-fs protocol as you cannot move the 
> junction or delete it or query it or know where it exists.  
> It depends solely on how the fileserver implements junctions 
> and what it can tell an admin about it.
> 
> ---The relationship between filesets is not available to an 
> admin through the NSDB so it has no idea how the namespace is 
> organized.
> 
> --- When an FSL is created we need to rely on an external 
> replication protocol that will also replicate the junction 
> information. If this protocol is not a part of fed-fs  we 
> cannot rely on it to work cross vendor.  To depend on this 
> undefined replication protocol  for the basic functioning of 
> the federation bothers me.
> 
> As I understand from Dan's comments  he had 3 issues on 
> storing the junction information in the NSDB.  #1 was related 
> to storing the parent_path information of the junction in the 
> NSDB. The main issue being what happens on a rename of any of 
> the path components.
> One way is not to store the path information but only < 
> FSN_PARENT, FSN_TARGET> in the NSDB. This will let an admin 
> know how the namespace is laid out in terms of filesets but 
> not paths.  This still leaves open how to delete a junction. 
> Another way to do it have the fileserver return an opaque 
> handle that identifies the junction location instead of path. 
> The NSDB can store it as the tuple  < FSN_PARENT, FSN_TARGET, 
> opaque_handle> and use it to manage the junction. This would 
> require the fileserver participation. Other possible ideas..
> 
> Dan's #2 was a non issue. The target FSN's fileserver knows 
> nothing about where it hanging from in the namespace.
> For #3  I am not sure why this relates to storing junction 
> information in the NSDB. It is related to  how the export 
> path is related to the common namespace. Maybe I did not get 
> that issue.
> 
> The problem of storing paths also exists in storing FSL paths 
>  in the NSDB.
> Am I mising something there?
> 
> 
> regards
> renu
> 
> 
> 
> 
> 
> 
>                                                               
>              
>              Paul Lemahieu                                    
>              
>              <LeMahieu_Paul at em                                
>              
>              c.com>                                           
>           To 
>              Sent by:                  "Everhart, Craig"      
>              
>              federated-fs-boun         
> <Craig.Everhart at netapp.com>         
>              ces at sdsc.edu                                     
>           cc 
>                                        federated-fs at sdsc.edu, 
> "Ellard,     
>                                        Daniel" 
> <Daniel.Ellard at netapp.com>  
>              06/11/2008 10:40                                 
>      Subject 
>              AM                        Re: [Federated-fs] 
> Conf call        
>                                        6/5/2008               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
>                                                               
>              
> 
> 
> 
> 
> As for why, it's to have a global namespace that spans 
> heterogeneous file servers, and have a common 
> configuration/management for that. A Linux box or AIX or a 
> NAS vendor could all be exposing the same /fedfs/ home, and 
> handing out referrals to users.
> 
> --Paul
> 
> On 2008-Jun-11, at 10:23, Everhart, Craig wrote:
> 
> > OK: why?  Also, what do we do with the hard questions 
> (synchronizing 
> > multiple data sources, doing permissions correctly)?
> >
> >
> > ________________________________
> >
> >            From: LeMahieu, Paul [mailto:LeMahieu_Paul at emc.com]
> >            Sent: Wednesday, June 11, 2008 1:19 PM
> >            To: Everhart, Craig; Robert Thurlow
> >            Cc: Ellard, Daniel; federated-fs at sdsc.edu
> >            Subject: Re: [Federated-fs] Conf call 6/5/2008
> >
> >
> >            Yes, it's independent of anything with DNS. When I say
> "top-of-
> > tree", I'm talking about storing all the configuration of the 
> > namespace (tens of thousands of junction points and their 
> paths in the 
> > logical namespace). For example, if there is 
> /fedfs/home/bob, there is 
> > an entry mapping /fedfs/home/bob to bob's share on some 
> physical file 
> > server. We'd be storing a pseudo file system representing 
> the top-of 
> > tree in the NSDB.
> >
> >            --Paul
> >
> >
> >            On 08/6/10 20:11, "Everhart, Craig"
> <Craig.Everhart at netapp.com>
> > wrote:
> >
> >
> >
> >                        Is this "top of tree" thought independent of 
> > the
> DNS-based
> > lookup?  Why
> >                        wouldn't that simply *be* the top level, 
> > leading
> to, as you say,
> > real
> >                        file systems?  Is this an intermediate level?
> >
> >                        I can read all your text 
> substituting "DNS" for
> "NSDB" and get a
> > working
> >                        (and specified and nearly existing) result.  
> > What
> am I missing?
> > Do I
> >                        need to invent yet another replicated global
> service?
> >
> >                                        Craig
> >
> >                        > -----Original Message-----
> >                        > From: Paul Lemahieu
> [mailto:LeMahieu_Paul at emc.com]
> >                        > Sent: Tuesday, June 10, 2008 7:53 PM
> >                        > To: Robert Thurlow
> >                        > Cc: Everhart, Craig; federated-fs at sdsc.edu;
> Ellard, Daniel
> >                        > Subject: Re: [Federated-fs] Conf 
> call 6/5/2008
> >                        >
> >                        > Robert,
> >                        >
> >                        > This is very much the motivation. It's a 
> > couple
> of key things:
> >                        >
> >                        >     * A standard to facilitate the
> administration of a
> >                        > federated global namespace
> >                        >     * The ability to federate 
> different file
> servers so they
> >                        > all expose the same global namespace
> >                        >
> >                        > A few things about how I would see this:
> >                        >
> >                        >     * I always thought of this as a
> "top-of-tree" namespace.
> >                        > In other words, at some point the global
> namespace ends and
> >                        > you hit real file systems. and the global
> namespace ends.
> >                        > Perhaps it is possible to manage junction 
> > points
> at arbitrary
> >                        > locations in the global tree, I just hadn't
> really considered it.
> >                        >     * Changes to the global 
> namespace are made
> by
> >                        > administrators, and made via the NSDB. The
> global namespace
> >                        > is not frequently changing.
> >                        > Participating file servers reflect those 
> > changes
> later in a
> >                        > loosely- coupled manner.
> >                        >     * This does not invalidate the existing
> federated-fs
> >                        > work. It would be essentially an additional
> database in the
> >                        > NSDB, mapping paths to FSNs.
> >                        >     * It is assumed that the NSDB 
> takes care of
> replicating this
> > data.
> >                        >
> >                        > The big difference from what you describe 
> > below
> and my view
> >                        > is whether we have an NSDB that 
> reflects the
> namespace
> >                        > created on the file server (the 
> query model 
> > you
> describe
> >                        > below), or whether the NSDB is 
> authoritative 
> > for
> the
> >                        > namespace and the file servers reflect that
> namespace
> >                        > configuration (my description, 
> where the NSDB 
> > is
> authoritative).
> >                        >
> >                        > --Paul
> >                        >
> >                        > On 2008-Jun-09, at 15:36, Robert 
> Thurlow wrote:
> >                        > > Everhart, Craig wrote:
> >                        > >> I *totally* agree with Dan's 
> bias on this.
> It's a surprise
> > to me
> >                        > >> that others thought that there was a
> "namespace" that existed
> >                        > >> independently of the file systems that 
> > make
> it up.
> >                        > >>
> >                        > >> What is the relationship 
> between path data
> that exists both
> > in a
> >                        > >> fileset (in a file 
> server--Dan's #1 case) 
> > and
> in the NSDB
> >                        > and (as in
> >                        > >> Paul's addendum to Dan's #3) all the
> instances of the parent
> > path
> >                        > >> data that are subsidiary filesets?  Is 
> > there
> some
> >                        > authoritative copy
> >                        > >> with the others just hints?  What is the
> replication protocol
> > by
> >                        > >> which path data is propagated?  How
> consistent does it have
> > to be?
> >                        > >>
> >                        > >> If we were stumbling thinking about the
> constraints on fileset
> >                        > >> replication and consistency, why would 
> > this
> be a simple answer?
> >                        > >> Right
> >                        > >> now it's an unmotivated, 
> underspecified, 
> > and
> unconstrained
> >                        > additional
> >                        > >> criterion for any implementation.  (And
> them's its good
> >                        > points...:-})
> >                        > >>
> >                        > >> If others (Paul? Renu?) feel like this 
> > needs
> to be changed,
> > could
> >                        > >> they do it with a more 
> fleshed-out proposal?
> >                        > >
> >                        > > I don't want to keep a copy of all of the
> mapping data (with or
> >                        > > without authority) in the NSDB, either.  
> > But I
> do wonder if the
> >                        > > motivation is something like this:
> >                        > >
> >                        > > I want an admin application that 
> can show 
> > me
> the whole
> >                        > namespace.  I
> >                        > > want to be able to browse it 
> like a Google
> map, going up/down,
> >                        > > left/right in the global 
> directory tree.  
> > I'd
> like to see
> >                        > which nodes
> >                        > > are junctions, and to be able to 
> click on 
> > them
> and see the
> > details
> >                        > > about their replicas.  In the fullness of
> time, I'd like to
> >                        > be able to
> >                        > > click on a point in the namespace and 
> > select
> an option to make
> > that
> >                        > > point a separate, replicated filesystem, 
> > and
> to adjust the
> >                        > > characteristics of existing filesystems.
> >                        > >
> >                        > > Now, if all we have is a way to 
> drill down
> from the top of the
> >                        > > namespace for any particular path, this 
> > could
> be bloody
> >                        > hard.  Having
> >                        > > more information would help.  An 
> alternate
> question is,
> >                        > where could we
> >                        > > get more info?
> >                        > >
> >                        > > One suggestion is that the 
> protocol permit 
> > a
> query to an NFS
> > server
> >                        > > participating in the namespace, 
> to list the
> junctions in a
> >                        > particular
> >                        > > filesystem.  If we could start 
> at the top,
> find the Nth level
> >                        > > filesystems and ask them what junctions 
> > point
> outside of the
> >                        > > filesystem, we could enumerate the 
> > namespace
> far quicker.  We
> > would
> >                        > > still have to send packets all over the 
> > place
> - "find the
> > root" DNS
> >                        > > queries, NSDB lookups, NFS accesses and
> whatever we use to
> >                        > enumerate
> >                        > > junctions in a filesystem - but I have 
> > always
> expected that.
> >                        > >
> >                        > > Does this shed light or kick up dust? :-)
> >                        > >
> >                        > > Rob T
> >                        > >
> >                        >
> >                        >
> >
> >
> >
> >
> 
> 
> 
> 


More information about the Federated-fs mailing list