[Federated-fs] Notes and news from the fed-fs conference call yesterday

Fri Aug 8 07:44:00 PDT 2008

First, some news.  I will be leaving NetApp this afternoon,
to pursue an interesting opportunity at BBN Technologies.
I will remain involved in the fed-fs work, at least for the
next few months, but in a reduced role (I will have a different
day job).  James Lentini, another member of the NetApp
Advanced Technology Group and an expert in RDMA and
NFS, will be taking over my fed-fs role at NetApp.  There may
be some other new faces (or familiar faces taking different
roles) as well.

If you need to contact me, please use my gmail address.
My NetApp address start to fail (probably silently) in a few
hours.

Notes from the conf call:

Attending: Daniel Ellard, James Lentini (NetApp), Paul
LeMahieu (EMC/Rainfinity), Renu Tewari, Manoj Naik (IBM/ARC),
Mario Wurzl (EMC).

THE DEMO

Manoj asked about the demo in Dublin.  Not that many people
got to see it, because attendance in Dublin was low.  Here is
a brief description:  Amy Weaver demo'd a namespace implemented
by several servers.  I'm not sure of the final config, but in our
practice runs, we had an ONTAP 8  (development version) system,
an  ONTAP 7 version (released) box, a Linux 2.6 box, and an
OpenSolaris box (not sure of the version).  The ONTAP 8 system
implemented the root fileset and several of the other filesets.
Only ONTAP 8 fully supports the fed-fs protocol right now, so
the other servers hosted leaf filesets in the namespace.

The demo system had one NSDB (for simplicity -- we've also
demo'd several) and the demo was performed live, over the
network, using machines located in Sunnyvale.

Amy demonstrated how to create junctions and then demo'd a
vanilla Linux client traversing the namespace.

We're open-sourcing a version of the NSDB and a C API
for it (dual license, BSD/GPL2) that conforms to the draft of
the protocol from earlier this week.  Because the draft is still
ambiguous and underspecified in places, the code takes some
short-cuts.  It should be considered a proof-of-concept at
most, but it could help bootstrap other implementations.  We'd
love to see people federating across platforms at a bakeathon
or connectathon soon!

PUSH VERSUS PULL

If we have a replicated root fileset (and we believe we will
need to), then how are changes to the root propagated from
the "master" (which we believe we'll probably need) to all
of the replicas?  Paul had taken the position, a few weeks
ago, that having the master push changes out the replica
makes the most sense.  Others pointed out that in a
federated and distributed system, a push model is hard
to implement because if a replica goes down, or something
else happens that makes the state of the replica unknown to
the master (possibly without the master even knowing), it
can be hard for the master to know what state each replica
is in, and what messages to send to bring that replica up to
date.  On the other hand, the replicas can keep track of
what messages the have and haven't seen from the master
(and/or a logical clock for state versioning) and pull the
changes from the master.

Either way there are complicated cases, but the observation
was that in some cases, pull is really necessary, so if we need
a pull anyway, let's not do both.

Paul was not on the call last week, so we postponed further
discussion until this week.  After some discussion the consensus
is that a pull model is OK as long as we can ensure that there is a
way to minimize the state changes the replicas need to pull (i.e.,
just deltas) rather than brute-force polling.

Paul and Renu are signed up to continue refining their document
about how the replicas get updated.

TRANSACTIONS -- NEEDED?

One of the assumptions required to do simple things like update
the state of the replicas is that it's possible to actually *define*
the state of an NSDB at a logical moment.  This seems like a
simple thing, but it's not something that LDAP seems to support.
Basic LDAP does not seem to have a graceful way to deal with
concurrent updates to multiple, overlapping records -- no locking
or transaction support.  Since many "logical" NSDB operations
involve multiple NSDB operations, this could make ensuring
the correctness of the NSDB operations difficult or impossible.

We discussed a strawman proposal to implement vector clocks
(one scaler per admin) but the feeling was that it would be
clunky at best and might not even work at all.

Mario took on the task of figuring out how to coax LDAP into
doing what we need.  James pointed out RFC 4525, which adds
a modify-increment operation to LDAP, but it is not known whether
this extension is widely implemented, or actually has the necessary
semantics to implement something like a semaphore.

-Dan