Monday, May 17, 2010

Amazon's S3

S3 stands for Simple Storage Service. It is a kind of cloud storage service. It acts as a big online network storage 'disk'. The concept behind S3 is similar to that of distributed file systems. The counterpart of Google is GFS with some differences.

The definition of S3

Amazon S3 is a web service that enables you to
store data in the cloud. You can then download the data or use the data with
other AWS services, such as Amazon Elastic Cloud Computer (EC2).


We can see from this definition that S3 provides not only the services as that big online network storage 'disk', but also the source feeds/data feeds for the other Amazon web services.

The components of S3 concept
1. Bucket (we can understand it as the directory of the file systems)
A bucket is a container for the objects stored in Amazon S3. Every object is contained in a bucket. For example, if the object's name is photo/bmw.jpg, which is stored in davidliang bucket. Then it is addressable via the URL: http://davidliang.s3.amazonaws.com/photo/bmw.jpg

The bucket cannot be nested, that is, there is no sub-folder. Officially, it is for namespace and it is the unit for access control. But, actually it is still a folder.

2. Object (we can understand it as the file of the file systems)
An object has the object name and object properties. The maximum size of the object is 5G.

'Versioning' property of object is the result of evolution. In the early days, there is no such a property for object. I think this is oriented to the end-users who have the requirements of version control.

3. Keys (We can understand it as the file name of the file system)
The format of keys is URL. Just remember: Amazon's services are all accessed by Web Service or REST. Let's see an example: http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl Within this URL, 'doc' is the bucket, '2006-03-01/AmazonS3.wsdl' is the key. With the introduction of versioning, 'bucket+key+versioning' will be used to identify a file uniquely.

4. Vesioning (We can understand it as one of the versions within CVS)
When we create a new file which shares the same name with the existing one, it does not overwrite the existing file but create a new file with a new version number.

Same as the operation of deletion, it is not real but add a 'Delete Marker'. When the object marked with 'Delete Marker' is requested via 'GET' operation, the error 404 will return as a result. Of course, you can delete an object permantly with assigning the ID.

5. Regions (We can understand it as the geographical location where the file was stored)
The concept is not found any countpart in the traditional file systems. In the meantime, many people argue about it. Some says it is a good idea. But, some says it should be behind the scene of the application deployment. Currently the servers are located in US, Ireland, Singapore, etc.


[Thanks for appleleaf's contributions to this post]

References:
http://aws.amazon.com/s3/#functionality
http://docs.amazonwebservices.com/AmazonS3/2006-03-01/
http://developer.amazonwebservices.com/connect/forum.jspa?forumID=24

No comments: