Scanning s3 with bucketAV

Gauravkumar
2 min readSep 28, 2022

One day one customer came to us asking that they need data in the s3 bucket and that bucket should be scanned with AV.

Initially, I tried to install ClamAV but since the request is urgent so I can't wait for the IT team to provide the right access, so meanwhile I came to know about bucketAV.

BucketAV is free for the first 15 days and costs 0.16 dollars per hour if you use this service. here is the link to the bucketAV https://aws.amazon.com/marketplace/pp/prodview-sykoblbsdgw2o the setup is also very easy so you can follow the setup guide given in the link https://bucketav.com/help/setup-guide/.

After installing it I found that it can only scan that bucket whose size is 5 GB. We have data of around 25 GB. Now problem is to make a small block of 5 GB and then scan it. I used certain steps to do this since I am using an EMR cluster for my workplace so I used some steps to optimize the cost as well.

STEP 1: Start some uniform group clusters so that you can optimize your AWS cost. since we need to perform some operations so we don't need a large cluster.

STEP 2: Start copying the file from our bucket to EC instances using the following command aws

s3 cp s3 bucket link -d path of the folder on the instance you want to copy.

STEP 3: Now unzip the folder using unzip “file.zip”

STEP 4: After unzipping we need to split the large 25 GB file into 3.5GB(Fix the block size as per your convenience ). we use the following command to do this

split “/home/folder name/Record_A.csv” — additional-suffix=.csv -b 3584m

here 3584m is representing 3.5 GB into a byte. and the split file named xaa,xab,xac, etc.., automatically by the machine itself as a split result.

STEP 5: We need to compress the split file and do zipping of each block, why do I do this step? because this is a request made by the customer, so for doing so I used

zip -j “/folder name(where I want to zip)/xaa.zip” “/folder name(where I stored the split result/xaa.csv”

STEP 6: I have to upload this small zip file to the s3 bucket.

AWS cp “from the local instance(EMR) where our zip file is stored”- d “s3 bucket where we want to store”.

STEP 7: Data stored in the s3 bucket will start tagging AV CLEAN which means the bucket is scanned and that bucket did not consist of any Viruses. I am attaching a screenshot for the same.

STEP 8: After tagging each file to clean, a log file is generated where the information of tagging is given for each file.

--

--