Tag Archives: S3

How can I get the size of an Amazon S3 bucket? – Server Fault

The AWS CLI now supports the –query parameter which takes a JMESPath expressions.

This means you can sum the size values given by list-objects using sum(Contents[].Size) and count like length(Contents[]).

This can be be run using the official AWS CLI as below and was introduced in Feb 2014

aws s3api list-objects --bucket BUCKETNAME --output json --query "[sum(Contents[].Size), length(Contents[])]"

Source: How can I get the size of an Amazon S3 bucket? – Server Fault

Use s3cmd to Download Requester Pays Buckets on S3

List files under pdf:

$ s3cmd ls --requester-pays s3://arxiv/pdf
                       DIR   s3://arxiv/pdf/

List files under pdf:

$ s3cmd ls --requester-pays s3://arxiv/pdf/\*
2010-07-29 19:56 526202880   s3://arxiv/pdf/arXiv_pdf_0001_001.tar
2010-07-29 20:08 138854400   s3://arxiv/pdf/arXiv_pdf_0001_002.tar
2010-07-29 20:14 525742080   s3://arxiv/pdf/arXiv_pdf_0002_001.tar
2010-07-29 20:33 156743680   s3://arxiv/pdf/arXiv_pdf_0002_002.tar
2010-07-29 20:38 525731840   s3://arxiv/pdf/arXiv_pdf_0003_001.tar
2010-07-29 20:52 187607040   s3://arxiv/pdf/arXiv_pdf_0003_002.tar
2010-07-29 20:58 525731840   s3://arxiv/pdf/arXiv_pdf_0004_001.tar
2010-07-29 21:11  44851200   s3://arxiv/pdf/arXiv_pdf_0004_002.tar
2010-07-29 21:14 526305280   s3://arxiv/pdf/arXiv_pdf_0005_001.tar
2010-07-29 21:27 234711040   s3://arxiv/pdf/arXiv_pdf_0005_002.tar
...

Get all files under pdf:

$ s3cmd get --requester-pays s3://arxiv/pdf/\*

List all content to text file:

$ s3cmd ls --requester-pays s3://arxiv/src/\* > all_files.txt

Calculate file size:

$ awk '{s += $3} END { print "sum is", s/1000000000, "GB, average is", s/NR }' all_files.txt
sum is 844.626 GB, average is 4.80447e+08