Skip to content

Commit

Permalink
Merge pull request waynehoover#23 from ericboehs/rake_cleaner
Browse files Browse the repository at this point in the history
Rake task to clean old uploads
  • Loading branch information
waynehoover committed Dec 13, 2012
2 parents 6e8e8a2 + 9b45601 commit f5d30e3
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 4 deletions.
27 changes: 27 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,33 @@ $(document).bind 's3_uploads_complete', ->
alert("All Uploads completed")
```

## Cleaning old uploads on S3
You may be processing the files upon upload and reuploading them to another
bucket or directory. If so you can remove the originali files by running a
rake task.

First, add the fog gem to your `Gemfile` and run `bundle`:
```ruby
require 'fog'
```

Then, run the rake task to delete uploads older than 2 days:
```
$ rake s3_direct_upload:clean_remote_uploads
Deleted file with key: "uploads/20121210T2139Z_03846cb0329b6a8eba481ec689135701/06 - PCR_RYA014-25.jpg"
Deleted file with key: "uploads/20121210T2139Z_03846cb0329b6a8eba481ec689135701/05 - PCR_RYA014-24.jpg"
$
```

Optionally customize the prefix used for cleaning (default is `uploads/#{2.days.ago.strftime('%Y%m%d')}`):
**config/initalizers/s3_direct_upload.rb**
```ruby
S3DirectUpload.config do |c|
# ...
c.prefix_to_clean = "my_path/#{1.week.ago.strftime('%y%m%d')}"
end
```

## Contributing / TODO
This is just a simple gem that only really provides some javascript and a form helper.
This gem could go all sorts of ways based on what people want and how people contribute.
Expand Down
4 changes: 2 additions & 2 deletions lib/s3_direct_upload/config_aws.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ module S3DirectUpload
class Config
include Singleton

ATTRIBUTES = [:access_key_id, :secret_access_key, :bucket]
ATTRIBUTES = [:access_key_id, :secret_access_key, :bucket, :prefix_to_clean]

attr_accessor *ATTRIBUTES
end
Expand All @@ -15,4 +15,4 @@ def self.config
end
Config.instance
end
end
end
4 changes: 2 additions & 2 deletions lib/s3_direct_upload/form_helper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ def fields
end

def key
@key ||= "uploads/#{SecureRandom.hex}/${filename}"
@key ||= "uploads/#{DateTime.now.utc.strftime("%Y%m%dT%H%MZ")}_#{SecureRandom.hex}/${filename}"
end

def url
Expand Down Expand Up @@ -85,4 +85,4 @@ def signature
end
end
end
end
end
57 changes: 57 additions & 0 deletions lib/tasks/s3_direct_upload.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
namespace :s3_direct_upload do
desc "Removes old uploads from specified s3 bucket/directory -- Useful when uploads are processed into another directory"
task :clean_remote_uploads do
require 'thread'
require 'fog'

s3 = Fog::Storage::AWS.new(aws_access_key_id: S3DirectUpload.config.access_key_id, aws_secret_access_key: S3DirectUpload.config.secret_access_key)
bucket = S3DirectUpload.config.bucket
prefix = S3DirectUpload.config.prefix_to_clean || "uploads/#{2.days.ago.strftime('%Y%m%d')}"

queue = Queue.new
semaphore = Mutex.new
threads = []
thread_count = 20
total_listed = 0
total_deleted = 0

threads << Thread.new do
Thread.current[:name] = "get files"
# Get all the files from this bucket. Fog handles pagination internally.
s3.directories.get("#{bucket}").files.all({prefix: prefix}).each do |file|
queue.enq(file)
total_listed += 1
end
# Add a final EOF message to signal the deletion threads to stop.
thread_count.times { queue.enq(:EOF) }
end

# Delete all the files in the queue until EOF with N threads.
thread_count.times do |count|
threads << Thread.new(count) do |number|
Thread.current[:name] = "delete files(#{number})"
# Dequeue until EOF.
file = nil
while file != :EOF
# Dequeue the latest file and delete it. (Will block until it gets a new file.)
file = queue.deq
unless file == :EOF
file.destroy
puts %Q{Deleted file with key: "#{file.key}"}
end
# Increment the global synchronized counter.
semaphore.synchronize {total_deleted += 1}
end
end
end

# Wait for the threads to finish.
threads.each do |t|
begin
t.join
rescue RuntimeError => e
puts "Failure on thread #{t[:name]}: #{e.message}"
end
end
end
end

0 comments on commit f5d30e3

Please sign in to comment.