Download entire public S3 bucket without credentials

Published: Wednesday, Oct 9, 2013 Last modified: Monday, Dec 9, 2024

Say for example you have a S3 bucket named s3://mr2011/

If you have made your bucket public (Hint: s3cmd -P) it should reachable upon http://mr2011.s3.amazonaws.com, which may again redirect you to http://mr2011.s3-ap-southeast-1.amazonaws.com/.

Now you should see an XML listing like:

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Name>mr2011</Name>
<Prefix/>
<Marker/>
<MaxKeys>1000</MaxKeys>
<IsTruncated>false</IsTruncated>
<Contents>
<Key>foo.txt</Key>
<LastModified>2013-10-09T09:12:53.000Z</LastModified>
<ETag>"59ca0efa9f5633cb0371bbc0355478d8"</ETag>
<Size>13</Size>
<StorageClass>REDUCED_REDUNDANCY</StorageClass>
</Contents>
<Contents>
<Key>test.jpg</Key>
<LastModified>2013-10-09T09:12:53.000Z</LastModified>
<ETag>"c9af9358e016984c1445a6102b4c35fc"</ETag>
<Size>89761</Size>
<StorageClass>REDUCED_REDUNDANCY</StorageClass>
</Contents>
</ListBucketResult>

If not, perhaps you are missing the aws s3api put-bucket-acl --bucket archpi.dabase.com --acl public-read listing permission.

You’re not limited to 1k objects. Attach ?max-keys=2147483647 to get more. If <IsTruncated> is still True, one strategy is to break results down by specifying first letter of path by attaching ?prefix=a, ?prefix=b…. etc.

Fetch first give objects on the day 2015-10-12 providing that’s how you store your objects (HINT: prefix with YYYY-MM-DD is a good idea)

curl -s 'http://c.prazefarm.co.uk.s3.amazonaws.com?max-keys=5&prefix=2015-10-12' | xml sel -N w="http://s3.amazonaws.com/doc/2006-03-01/" -T -t -m "//w:Key" -o "${s3url%/}/" -v . -n

You can download all the resources with a bit of xmlstarlet magic like so: https://github.com/kaihendry/s3listing/blob/master/listing.sh

Alternatively to xmlstarlet, xml2 is a nice way of working xml.