[Federated-fs] Perspectives from data grids (or) Isn't it time the FS clients evolved for the future?

Arun Jagatheesan arun at sdsc.edu
Thu Mar 29 18:19:36 PDT 2007


Just to introduce my background to the FS community, I've been promoting
data grid concepts for the last few years. Data Grids allow a collaborative
namespace of data storage resources (mostly files and file servers) to be
shared amongst autonomous administrative domains. (Well, there are lot of
hype and based on the marketing person or vendor you talk to you will hear
different definitions). In the academic world, data grid concepts continue
to solve lot of problems and manage petabyte(s) of data - so they are real
and not all hype anymore .

I agreed/volunteered to post some requirements that are being solved using
data grids. These might be useful for standardization communities wrto FS
(file systems/servers) such as the Federated-FS. We are trying to
standardize these data grid concepts using OGF (Open Grid Forum). However,
data grid standards are at a higher layer on the protocol stack (XML, SOAP
etc). It might be useful to consider these requirements (or concepts) at a
lower level (byte level) too. 

The following are my "opinion" especially on the non-technical ones.
"End-user" could be any person or application using the file system.

Major non-technical requirements (change in design perspective):
NT-1) File systems are for end-users not administrators
NT-2) File systems of the 80s need not be the ones for the 20's
- The above are philosophical so I elaborate them below again. 

Technical requirements (change or design new functionalities):
T1) Each administrative domain is an autonomous entity. No cross
registration of user-ids or global administrative policies are possible in
production (even if admins were very friendly and are from academic world)
T2) A logical namespace of data is a MUST for a collaborative or federated
filesystem. In a logical namespace the "human readable names and order" of
the files might be different from the physical locations or FSN on the file
servers.
T3) End users SHOULD be able to replicate data 
T4) End users SHOULD be able to add metadata about data (files)
T5) End users SHOULD be able to discover or query data (files) just by
knowing attributes about the data  (When multiple organizations work
together, the hierarchical human readable namespace does not solve the data
organization or discovery problem).
T6) Mount points of several file systems SHOULD be avoided
T7) Users SHOULD be able to see the distribution of physical resources - The
concept of logical resources MUST be used along with a logical namespace
(this enables T3)


I have tried to make sure the above are just functional or user-oriented
requirements and not implementation requirements. Now to elaborate first two
non-technical or philosophical requirements (this is just my opinion and I
know I am stating the obvious which everyone knows)....

NT-1) File systems are for end-users not administrators 
Most of the file-system protocols or improvements on them seem to have the
data storage administrator as the target user (e.g.) replication. The
end-users will not see any advantage directly. If the end user does not see
any advantage... The administrator will not be asked to upgrade (or pushed
to upgrade) unless they decide voluntarily... The new products or standards
no matter how technically useful they are, will not be required or
appreciated. If the standard or products wanted to make a difference, they
must be designed for end-users and end-user applications (not for
administrators).

NT-2) File systems of the 80's need not be the ones for the 20's
End-users have to be given more functionalities which they can use them
selves. When multiple organizations or teams work together, they know that
data is not a single disk or sector. Everyone knows internet and distributed
computing except the file system. The new filesystem client protocols must
allow data distribution. RPC-style remote execution of user-defined programs
on file systems make remote data more usable (The FS will have to provide
suitable standard interface to add web-services at runtime, without these
there is not much use of the distributed data). The file system in these
cases becomes more than just a file system. End-users (not admins alone),
can define data-management policies or rules to manage their data.

In short, all these can be accomplished if standardization folks focus on
the end-user rather than the admin. A newer client-server protocol WILL be
required (that might or might not interact/interoperate with existing
client-server FS protocols). All these can be done easily as a customized or
single-vendor only solution, but users (even the academic community) will
prefer a standardized solution for their long-term requirements. Hence, I am
sharing my opinion with this community.

(OGF): https://forge.gridforum.org/sf/go/doc8271?nav=1 shows a high-level
perspective from the grid world.

Cheers,
Arun
~~~~~~~~~
Luck is what happens when preparation meets opportunity.

Arun swaran Jagatheesan
http://www.sdsc.edu/~arun/
San Diego Supercomputer Center.
(858)822.5452  

> -----Original Message-----
> From: federated-fs-bounces at sdsc.edu 
> [mailto:federated-fs-bounces at sdsc.edu] On Behalf Of Manoj Naik
> Sent: Wednesday, March 28, 2007 5:40 PM
> To: federated-fs at sdsc.edu
> Subject: [Federated-fs] Meeting Minutes: Conference Call on 3/28
> 
> Please respond if there are errors or if I missed anything.
> 
> Manoj Naik
> IBM Almaden Research Center.
> 
> Minutes of Conference Call to discuss Federated Filesystem 
> Requirements
> 
> Attendees:
> Dan Ellard, Craig Everhart (NetApp)
> Renu Tewari, Manoj Naik (IBM)
> Andy Adamson, Peter Honeyman (CITI)
> Rob Thurlow, Spencer Shepler (Sun)
> Arun Jagatheesan (SDSU)
Arun Jagatheesan (SDSC)
> 
<<<snip>>>
> 
> Arun mentioned that the grid folks have some requirements for 
> federation but most of the them clash with basic assumption 
> A1 (not changing client protocols). Nevertheless, he'll post 
> them to the list.
> 



More information about the Federated-fs mailing list