Total Pageviews

Sunday, 19 February 2012

shadow-file-system

shadow-file-system(SFS) is a posix-compliant distributed file system.

THE GOAL OF THIS PROJECT

I hope, SFS will become a widely used POSIX-COMPLIANT filesystem on the server side providing high availability, strong data consistency. System administrators don't have to worry about machine failure and data losses.For developers, building a massive internet service is no longer a complicate and error-prone task, the skills needed to construct such service is no more than knowing how to read/write a file using well-known standard API in every programming language.(for example, in C it is fopen/fread/fwrite)

THE ADVANTAGES OF USING SFS

Files are stored in more than one machine in order to provide high-availability, so if one machines fails, the system continue to function normally
The user of the SFS don't have to learn new api to take advantage of SFS, the only requirement of the client is knowing how to read/write files using standard unix API.
new servers can be added to the system online without interupting normal service
SFS will allocate a fastest data server node to client in order to provide amazing data read/write speed(planning)
SFS servers start up is blading fast, unlike other GFS-like system, SFS will start in a few seconds
In the event of network partioning, data consistency is not harmed, SFS provides strong data consistency to clients by client and server checksuming

A SFS is composed of four types of servers and a number of clients:

SFS-architecture http://www.flickr.com/photos/46972203@N02/4308412256/

Config Server

responsible for storing all the configure paramenter of the whole SFS system, including each server's IP/PORT/ROLE.

Name Server

responsible for resolving names in the file system and locating a metadata-server for a file. There is only one Name Server in the whole SFS.

Metadata Server

responsible for storing metadata and replication control of files, metadata of a file includes the location of the data server, the checksum of file data on each data server and file size. There can be mutilple Metadata Servers in the whole SFS.

Data Server

responsible for storing the actual file data. There can be multiple Data Servers in the whole SFS.

Clients

mounting the SFS through fuse, reading and writing files using stardard unix FILE API.
The procedure of client reading a file /sfs-mount/a.txt (here /sfs-mount is the mount point of the SFS on the client):
a) client query config server to locate name server's ip and port.
b) client ask name server to resove a.txt, name server response: metadata-svr-1 is responsible for a.txt.
c) client query config server to locate metadata server1's ip and port.
d) client ask metadata server1 which data server has the right version of a.txt, metadata server1 response: data server1 has the right copy of the file.
e) client query config server to locate data server1's ip and port.
f) client read a.txt's content from data server1.

important SFS inner operation procedures:

The procedure of client createing a file /sfs-mount/a.txt (here /sfs-mount is the mount point of the SFS on the client):
a) client query config server to locate name server's ip and port.
b) client ask the name server to allocate a metadata server to store a new file, the name server response: metdata-server1 is avaliable.
c) client query config server to locate metadata server1's ip and port.
d) client ask metadata-server1 to allocate a data server to store a new file, the metadata-server1 response: data-server1 is avaliable.
e) client query config server to locate data server1's ip and port.
f) client create a new file in data-server1.
g) client create related record in metadata-server1.
h) client create related record in name server.
The procedure of client writing a file /sfs-mount/a.txt (here /sfs-mount is the mount point of the SFS on the client):
a) client query config server to locate name server's ip and port.
b) client ask name server to resove a.txt, name server response: metadata-svr-1 is responsible for a.txt.
c) client query config server to locate metadata server1's ip and port.
d) client ask metadata server1 which data server has the right version of a.txt, metadata server1 response: data server1 has the right copy of the file.
e) client query config server to locate data server1's ip and port.
f) client write a.txt's content to data server1.
g) data-server1 notify metadata server1 with its new content's crc.

server storage example

for /sfs-mount/a.txt file
name server
path:ns_data_root/a.txt
contents: { "MDS_ID": 1, "DIR_ID": 123, "FILE_ID": 456}
meaning: meta data server1 is responsible for a.txt and a.txt can be found at dir_id:123 and file_id:456 on the metadata server and data server.
metadata server1:
path:mds_data_root/0/123/456
contents: { "CLIENT_CRC": [ 111111, 222222 ] }, "DATA_SVR_MD_RECORDS": [ { "DATA_SVR_ID": 1, "FILE_SIZE": 8888888, "CRC": [ 111111, 222222 ] }, { "DATA_SVR_ID": 2, "FILE_SIZE": 8888888, "CRC": [ 111111, 222222 ] } ] }
meaning: data server1 an data server2 all match the client-computed crc, therefor they hold the correct replica.
data server1 and data server2:
path: ds_data_root/0/123/456
content: a.txt's acutal content
you can ses that name server's and metadata server's data are stored in JSON format, in fact the communicate protocal between servers are all in JSON.

HOW TO INSTALL

1 check out the latest source
svn checkout http://shadow-file-system.googlecode.com/svn/trunk/ shadow-file-system-read-only
2 install other libraries need by SFS, including json-c libevent log4cxx mysql fuse
3 build SFS from source
cd $SFS_SOURCE_HOME; ./configure --prefix=$SFS_SOURCE_HOME/build; make; make install
here $SFS_SOURCE_HOME is the directory where you check out the SFS source
4 start mysql daemon, change the password of the root user of mysql to 123456(or create another user/use another password as you want)
5 editing the SFS mysql init scripts in $SFS_SOURCE_HOME/scripts/init_sfs.sql
specifying all the servers's ip address/port and where to store the actual data
6 import the mysql init script to mysql
mysql -u root -p12456 < $SFS_SOURCE_HOME/scripts/init_sfs.sql
7 init the metadata server's storage directory
$SFS_SOURCE_HOME/init_mds_dir.sh some/where/to/store/the/meta/data
8 init all the data server's storage directory (assumeing having 2 data servers)
$SFS_SOURCE_HOME/init_ds_dir.sh some/where/to/store/the/ds1/data
$SFS_SOURCE_HOME/init_ds_dir.sh some/where/to/store/the/ds2/data
9 add $SFS_SOURCE_HOME/build/lib to $LD_LIBRARY_PATH environment virable
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SFS_SOURCE_HOME/build/lib
10 start config server
cd $SFS_SOURCE_HOME/scripts && start_cnf_svr.sh
specifying the mysql user and password in start_cnf_svr.sh
11 start name server
cd $SFS_SOURCE_HOME/scripts && start_name_svr.sh
12 start meta data server
cd $SFS_SOURCE_HOME/scripts && start_metadata_svr.sh
13 start data servers
cd $SFS_SOURCE_HOME/scripts && start_data_svr1.sh
cd $SFS_SOURCE_HOME/scripts && start_data_svr2.sh
14 start SFS client and mounting the SFS on a directory
cd $SFS_SOURCE_HOME/scripts && start_sfs_fuse.sh
thus the SFS is mounted in $SFS_SOURCE_HOME/filesystem
the mounting point is specified in start_sfs_fuse.sh
15 all works are done, try to create/read/write files in $SFS_SOURCE_HOME/filesystem
all logs are written in $SFS_SOURCE_HOME/log

FAQ:

How the maintain data consistency among multiple data machines?
The content of every files on the data servers is CRCed and the CRC records are stored in the metadata servers. Further more, when the client try to write new data to a file, the new data is CRCed on the client machine too, the client-computed CRC info also store in the metadata server. Therefore , the Metadata server can tell for a specific file, which data servers has the right replica(the ones that match the client-computed CRC record).
How replication works?
when querying metadata info of a file from metadata servers, metadata server will check:how many correct replicas reside on all data-servers, if the number is less than the replication-factor configured, the metadata server will order one of the data servers to copy the content of the file from a correct server.

from http://code.google.com/p/shadow-file-system/

No comments:

Post a Comment