The following is implemented in "weed upload" command. For 3rd party clients, here is the spec.

To support large files, SeaweedFS supports these two kinds of files:

  • Chunk File. Each chunk file is actually just normal files to SeaweedFS.
  • Chunk Manifest. A simple json file with the list of all the chunks.

This piece of code shows the json file structure:

When reading Chunk Manifest files, the SeaweedFS will find and send the data file based on the list of ChunkInfo.

SeaweedFS delegates the effort to the client side. The steps are:

  • split large files into chunks
  • upload each file chunk as usual. Save the related info into ChunkInfo struct. Each chunk can be spread onto different volumes, possibly giving faster parallel access.
  • upload the manifest file with mime type "application/json", and add url parameter "cm=true". The FileId to store the manifest file is the entry point of the large file.

Usually we just append large files. Updating a specific chunk of file is almost the same.

  • upload the new file chunks as usual. Save the related info into ChunkInfo struct.
  • update the updated manifest file with mime type "application/json", and add url parameter "cm=true".
  1. # curl -X POST -F "file=@xaa" http://localhost:9333/submit?pretty=yes
  2. {
  3. "eTag": "809b2add",
  4. "fid": "6,1b70e99bcd",
  5. "fileName": "xaa",
  6. "fileUrl": "10.34.254.62:8080/6,1b70e99bcd",
  7. "size": 73433
  8. }
  9. # curl -X POST -F "file=@xab" http://localhost:9333/submit?pretty=yes
  10. {
  11. "eTag": "9c6ca661",
  12. "fid": "3,1c863b4563",
  13. "fileName": "xab",
  14. "fileUrl": "10.34.254.62:8080/3,1c863b4563",
  15. "size": 73433
  16. }
  17. // get one fid for manifest file
  18. # curl "10.34.254.62:9333/dir/assign?pretty=yes"
  19. {
  20. "fid": "5,1ea9c7d93e",
  21. "url": "10.34.254.62:8080",
  22. "publicUrl": "10.34.254.62:8080",
  23. }
  24. # cat manifest.json
  25. {
  26. "name": "sy.jpg",
  27. "mime": "image/jpeg",
  28. "size": 146866,
  29. "fid": "6,0100711ab7",
  30. "offset": 0,
  31. "size": 73433
  32. }, {
  33. "fid": "3,1c863b4563",
  34. "offset": 73433,
  35. "size": 73433
  36. }]
  37. }
  38. // upload the manifest file
  39. # curl -v -F "file=@manifest.json" "http://10.34.254.62:8080/5,1ea9c7d93e?cm=true&pretty=yes"
  40. * Trying 10.34.254.62...
  41. * Connected to 10.34.254.62 (10.34.254.62) port 8080 (#0)
  42. > POST /5,1ea9c7d93e?cm=true&pretty=yes HTTP/1.1
  43. > Host: 10.34.254.62:8080
  44. > User-Agent: curl/7.47.0
  45. > Accept: */*
  46. > Content-Length: 418
  47. > Expect: 100-continue
  48. > Content-Type: multipart/form-data; boundary=------------------------a872064c8f40903c
  49. >
  50. < HTTP/1.1 100 Continue
  51. < HTTP/1.1 201 Created
  52. < Content-Type: application/json
  53. < Etag: "2229f9b4"
  54. < Date: Wed, 15 Jan 2020 03:12:18 GMT
  55. < Content-Length: 66
  56. <
  57. "name": "manifest.json",
  58. "size": 213,
  59. "eTag": "2229f9b4"
  60. * Connection #0 to host 10.34.254.62 left intact
  61. // download the full file
  62. # curl -v "http://10.34.254.62:8080/5,1ff0fb46c9" -o out.data
  63. * Trying 10.34.254.62...
  64. % Total % Received % Xferd Average Speed Time Time Time Current
  65. Dload Upload Total Spent Left Speed
  66. 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 10.34.254.62 (10.34.254.62) port 8080 (#0)
  67. > GET /5,1ff0fb46c9 HTTP/1.1
  68. > Host: 10.34.254.62:8080
  69. > User-Agent: curl/7.47.0
  70. > Accept: */*
  71. >
  72. < HTTP/1.1 200 OK
  73. < Accept-Ranges: bytes
  74. < Content-Disposition: inline; filename="sy.jpg"
  75. < Content-Length: 146866
  76. < Content-Type: image/jpeg
  77. < Etag: "3e8ef528"
  78. < Last-Modified: Wed, 15 Jan 2020 03:32:47 GMT
  79. < X-File-Store: chunked
  80. < Date: Wed, 15 Jan 2020 03:33:04 GMT
  81. <
  82. { [7929 bytes data]
  83. 100 143k 100 143k 0 0 47.2M 0 --:--:-- --:--:-- --:--:-- 70.0M
  84. * Connection #0 to host 10.34.254.62 left intact
  85. // check md5 of the downloaded files
  86. # md5sum out.data
  87. 836eababc392419580641a7b65370e82 out.data
  88. # md5sum sy.jpg
  89. 836eababc392419580641a7b65370e82 sy.jpg

There are no particular limit in terms of chunk file size. Each chunk size does not need to be the same, even in the same file. The rule of thumb is to just being able to keep the whole chunk file in memory, and not to have too many small chunk files.

The filer server and the FUSE implementation that uses filer server are automatically chunking large files into smaller chunks.

The list of chunks are stored in filer storage, and managed by filer or weed mount client.