by Mohd Shibli
Posted on 24 January 2018
In this article we are going to create a simple web scraper python program which will fetch the source code a particular webpage, we are going to use pythons urllib module which is mainly used to fetch data across the world wide web.
So let's start by importing the urllib module the module is divided into three parts in python3 the
urllib.error in our code, we are going to use
Also Read: Run Length Encoding (RLE) Program in Python
import urllib.request import argparse
Now let's write a python function to fetch the webpage source code let's name the function as
def getCode(url): raw = urllib.request.urlopen(url).read() code = raw.decode() sys.stdout.write(code)
If you are wondering "that's it" then you are totally correct we need this much lines of code to fetch the HTML source code of a webpage( Python's Swag).
In the above code we are simply opening a connection between our machine and host URL then we are reading the webpage data in form of bytestream and after that, we decoded the data into a readable format using the
Now let's see the complete program for fetching the source code of a webpage in python.
import urllib.request as ul import argparse def getCode(url): raw = ul.urlopen(url).read() code = raw.decode() sys.stdout.write(code) if __name__ == '__main__': parser = argparse.ArgumentParser(description="Hostname") parser.add_argument("--url",action="store",dest="url",required=True) args = parser.parse_args() url = args.url getCode(url)
to run the above program you need to pass an extra parameter '--url' as shown below
python filename.py --url=http://python.org
Preorder and Postorder Traversal of binary tree in Python
02 September 2018
Binary Tree in Python
02 September 2018
Explaining Register variables in C with examples
17 August 2018
Data Autosave System using PHP, MySQL and AJAX
06 July 2018