S3 API

    Alluxio supports a RESTful API that is compatible with the basic operations of the Amazon .

    The Alluxio S3 API should be used by applications designed to communicate with an S3-like storage and would benefit from the other features provided by Alluxio, such as data caching, data sharing with file system based applications, and storage system abstraction (e.g., using Ceph instead of S3 as the backing store). For example, a simple application that downloads reports generated by analytic tasks can use the S3 API instead of the more complex file system API.

    There are performance implications of using the S3 API. The S3 API leverages the Alluxio proxy, introducing an extra hop. For optimal performance, it is recommended to run the proxy server and an Alluxio worker on each compute node. It is also recommended to put all the proxy servers behind a load balancer.

    The following table describes the support status for current Amazon S3 functional features:

    Alluxio S3 client supports various programming languages, such as C++, Java, Python, Golang, and Ruby. In this documentation, we use curl REST calls and python S3 client as usage examples.

    Create a bucket

    Get the bucket (listing objects)

    1. HTTP/1.1 200 OK
    2. Date: Tue, 18 Jun 2019 21:23:56 GMT
    3. Content-Type: application/xml
    4. Content-Length: 191
    5. Server: Jetty(9.2.z-SNAPSHOT)
    6. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>0</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated></ListBucketResult>

    Put an object

    Assuming there is an existing file on local file system called LICENSE:

    1. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/testobject
    2. HTTP/1.1 100 Continue
    3. HTTP/1.1 200 OK
    4. Date: Tue, 18 Jun 2019 21:24:32 GMT
    5. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
    6. Content-Length: 0
    7. Server: Jetty(9.2.z-SNAPSHOT)

    Get the object:

    1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject
    2. HTTP/1.1 200 OK
    3. Date: Tue, 18 Jun 2019 21:24:57 GMT
    4. Last-Modified: Tue, 18 Jun 2019 21:24:33 GMT
    5. Content-Type: application/xml
    6. Content-Length: 27040
    7. Server: Jetty(9.2.z-SNAPSHOT)
    8. .................. Content of the test file ...................

    Listing a bucket with one object

    1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket
    2. HTTP/1.1 200 OK
    3. Date: Tue, 18 Jun 2019 21:25:27 GMT
    4. Content-Type: application/xml
    5. Content-Length: 354
    6. Server: Jetty(9.2.z-SNAPSHOT)
    7. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken/><KeyCount>1</KeyCount><MaxKeys>1000</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>testobject</Key><LastModified>2019-06-18T14:24:33.029Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

    Listing a bucket with multiple objects

    You can upload more files and use the max-keys and continuation-token as the GET bucket request parameter. For example:

    1. $ curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key1
    2. HTTP/1.1 100 Continue
    3. HTTP/1.1 200 OK
    4. Date: Tue, 18 Jun 2019 21:26:05 GMT
    5. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
    6. Server: Jetty(9.2.z-SNAPSHOT)
    7. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key2
    8. HTTP/1.1 100 Continue
    9. HTTP/1.1 200 OK
    10. Date: Tue, 18 Jun 2019 21:26:28 GMT
    11. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
    12. Content-Length: 0
    13. Server: Jetty(9.2.z-SNAPSHOT)
    14. # curl -i -X PUT -T "LICENSE" http://localhost:39999/api/v1/s3/testbucket/key3
    15. HTTP/1.1 100 Continue
    16. HTTP/1.1 200 OK
    17. Date: Tue, 18 Jun 2019 21:26:43 GMT
    18. ETag: "911df44b7ff57801ca8d74568e4ebfbe"
    19. Server: Jetty(9.2.z-SNAPSHOT)
    20. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2
    21. HTTP/1.1 200 OK
    22. Date: Tue, 18 Jun 2019 21:26:57 GMT
    23. Content-Type: application/xml
    24. Content-Length: 528
    25. Server: Jetty(9.2.z-SNAPSHOT)
    26. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken/><NextContinuationToken>key3</NextContinuationToken><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>true</IsTruncated><Contents><Key>key1</Key><LastModified>2019-06-18T14:26:05.694Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>key2</Key><LastModified>2019-06-18T14:26:28.153Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>
    27. # curl -i -X GET http://localhost:39999/api/v1/s3/testbucket\?max-keys\=2\&continuation-token\=key3
    28. HTTP/1.1 200 OK
    29. Date: Tue, 18 Jun 2019 21:28:14 GMT
    30. Content-Type: application/xml
    31. Content-Length: 531
    32. Server: Jetty(9.2.z-SNAPSHOT)
    33. <ListBucketResult><Name>/testbucket</Name><Prefix/><ContinuationToken>key3</ContinuationToken><NextContinuationToken/><KeyCount>2</KeyCount><MaxKeys>2</MaxKeys><IsTruncated>false</IsTruncated><Contents><Key>key3</Key><LastModified>2019-06-18T14:26:43.081Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents><Contents><Key>testobject</Key><LastModified>2019-06-18T14:24:33.029Z</LastModified><ETag></ETag><Size>27040</Size><StorageClass>STANDARD</StorageClass></Contents></ListBucketResult>

    You can also verify those objects are represented as Alluxio files, under /testbucket directory.

    1. $ ./bin/alluxio fs ls -R /testbucket
    2. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:05:694 100% /testbucket/key1
    3. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:28:153 100% /testbucket/key2
    4. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:26:43:081 100% /testbucket/key3
    5. -rw-r--r-- alluxio staff 27040 PERSISTED 06-18-2019 14:24:33:029 100% /testbucket/testobject

    Delete objects

    1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key1
    2. HTTP/1.1 204 No Content
    3. Date: Tue, 18 Jun 2019 21:31:27 GMT
    4. Server: Jetty(9.2.z-SNAPSHOT)
    1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key2
    2. HTTP/1.1 204 No Content
    3. Date: Tue, 18 Jun 2019 21:31:44 GMT
    4. Server: Jetty(9.2.z-SNAPSHOT)
    5. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/key3
    6. HTTP/1.1 204 No Content
    7. Date: Tue, 18 Jun 2019 21:31:58 GMT
    8. Server: Jetty(9.2.z-SNAPSHOT)
    9. # curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject
    10. HTTP/1.1 204 No Content
    11. Date: Tue, 18 Jun 2019 21:32:08 GMT
    12. Server: Jetty(9.2.z-SNAPSHOT)

    Initiate a multipart upload

    Since we deleted the testobject in the previous command, you have to create another testobject before initiating a multipart upload.

    Note that the commands below related to multipart upload need the upload ID shown above, it’s not necessarily 3.

    Upload part

    1. $ curl -i -X PUT 'http://localhost:39999/api/v1/s3/testbucket/testobject?partNumber=1&uploadId=3'
    2. HTTP/1.1 200 OK
    3. Date: Tue, 18 Jun 2019 21:33:36 GMT
    4. ETag: "d41d8cd98f00b204e9800998ecf8427e"
    5. Server: Jetty(9.2.z-SNAPSHOT)

    List parts

    1. $ curl -i -X GET http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3
    2. HTTP/1.1 200 OK
    3. Date: Tue, 18 Jun 2019 21:35:10 GMT
    4. Content-Type: application/xml
    5. Content-Length: 296
    6. Server: Jetty(9.2.z-SNAPSHOT)
    7. <ListPartsResult><Bucket>/testbucket</Bucket><Key>testobject</Key><UploadId>3</UploadId><StorageClass>STANDARD</StorageClass><IsTruncated>false</IsTruncated><Part><PartNumber>1</PartNumber><LastModified>2019-06-18T14:33:36.373Z</LastModified><ETag>""</ETag><Size>0</Size></Part></ListPartsResult>

    Complete a multipart upload

    1. $ curl -i -X POST http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3
    2. HTTP/1.1 200 OK
    3. Date: Tue, 18 Jun 2019 21:35:47 GMT
    4. Content-Type: application/xml
    5. Content-Length: 201
    6. Server: Jetty(9.2.z-SNAPSHOT)
    7. <CompleteMultipartUploadResult><Location>/testbucket/testobject</Location><Bucket>testbucket</Bucket><Key>testobject</Key><ETag>"d41d8cd98f00b204e9800998ecf8427e"</ETag></CompleteMultipartUploadResult>

    Abort a multipart upload

    1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket/testobject?uploadId=3
    2. HTTP/1.1 204 No Content
    3. Date: Tue, 18 Jun 2019 21:37:27 GMT
    4. Server: Jetty(9.2.z-SNAPSHOT)

    Delete an empty bucket

    1. $ curl -i -X DELETE http://localhost:39999/api/v1/s3/testbucket
    2. HTTP/1.1 204 No Content
    3. Date: Tue, 18 Jun 2019 21:38:38 GMT
    4. Server: Jetty(9.2.z-SNAPSHOT)

    Python S3 Client

    Tested for Python 2.7.

    Create a connection:

    Please note you have to install boto package first.

    1. pip install boto
    1. import boto
    2. import boto.s3.connection
    3. conn = boto.connect_s3(
    4. aws_access_key_id = '',
    5. aws_secret_access_key = '',
    6. host = 'localhost',
    7. port = 39999,
    8. path = '/api/v1/s3',
    9. is_secure=False,
    10. calling_format = boto.s3.connection.OrdinaryCallingFormat(),
    11. )

    Create a bucket

    1. bucketName = 'bucket-for-testing'
    2. bucket = conn.create_bucket(bucketName)

    PUT a small object

    Get the small object

    1. assert smallObjectContent == key.get_contents_as_string()

    Upload a large object

    Create a 8MB file on local file system.

    1. $ dd if=/dev/zero of=8mb.data bs=1048576 count=8

    Then use python S3 client to upload this as an object

    1. largeObjectKey = 'large.txt'
    2. largeObjectFile = '8mb.data'
    3. key = bucket.new_key(largeObjectKey)
    4. with open(largeObjectFile, 'rb') as f:
    5. key.set_contents_from_file(f)
    6. with open(largeObjectFile, 'rb') as f:
    7. largeObject = f.read()

    Get the large object

    1. assert largeObject == key.get_contents_as_string()

    Delete the objects

    1. bucket.delete_key(smallObjectKey)
    2. bucket.delete_key(largeObjectKey)

    Initiate a multipart upload

    1. mp = bucket.initiate_multipart_upload(largeObjectKey)

    Upload parts

    1. import math, os
    2. from filechunkio import FileChunkIO
    3. # Use a chunk size of 1MB (feel free to change this)
    4. sourceSize = os.stat(largeObjectFile).st_size
    5. chunkSize = 1048576
    6. chunkCount = int(math.ceil(sourceSize / float(chunkSize)))
    7. for i in range(chunkCount):
    8. offset = chunkSize * i
    9. bytes = min(chunkSize, sourceSize - offset)
    10. with FileChunkIO(largeObjectFile, 'r', offset=offset, bytes=bytes) as fp:
    11. mp.upload_part_from_file(fp, part_num=i + 1)

    Complete the multipart upload

    1. mp.complete_upload()

    Abort the multipart upload

    Non-completed uploads can be aborted.

    Delete the bucket

    1. bucket.delete_key(largeObjectKey)