Virendra Rajput (BkVirendra)

home me my projects

Directory Downloader in Python

By Virendra Rajput Tagged:

Recently, while browsing for some Facebook Timeline covers that I wanted for my Facebook Profile. I came across hundreds of covers that I would love to have on my hard-disk as my Timeline Cover collection (yeah, I have a Timeline Cover collection).

And then, I came across a few websites that allowed directory browsing. So I started saving the images manually from the index. And well those directories had thousands of images, and downloading them manually would suck (being a #hacker you always want everything to be automated).

So I started hacking a script, that would carry out this task for me. And in just 15 minutes, I cracked it. And had fun, downloading entire webserver directories in minutes.

You can use this Python script to download entire directories (if the webserver has indexes open).

This script also makes use of Beautifulsoup, you can install it, by using the following command:

pip install beautifulsoup4  # if you have pip installed

 easy_install BeautifulSoup4 # if you have easy_install

For using the script, you need to pass the directory url as a commandline argument to the script, for Eg. 

For downloading the directory at http://www.namecovers.com/asset/thumb/

$ python downloader.py http://www.namecovers.com/_asset/_thumb/

The code:

import urllib2
import sys
import os

from bs4 import BeautifulSoup
from urlparse import urlparse

def downloader(urls, grab_url, foldername):
    if not os.path.exists(foldername):
        print "\""+ foldername + "\" does not exist!"
        os.makedirs(foldername)
        print "Creating \"" + foldername + "\"..." 
    for cover in urls:
        try:
            print "Downloading item " + cover + "..."
            print grab_url + cover
            img = urllib2.urlopen(grab_url + cover)
            output = open(foldername + "/" + cover,''wb'')
            output.write(img.read())
            output.close()
            print cover + "... downloaded!!"
        except Exception, e:
            pass
    return

def main(url):
    urls = []
    print "Fetching the page..."
    page = urllib2.urlopen(url).read()
    print "Fetching completed!"
    soup = BeautifulSoup(page)
    print "Grabbing the objects of the page..."
    lis = soup.find_all("li")
    for item in lis:
        urls.append(item.a["href"])
    domain = urlparse(url)
    downloader(urls, url, domain.netloc)
    print "All files have been successfully downloaded!"
    return

if __name__ == "__main__":
    main(sys.argv[1])

You can also fork it on Github here.