- Get link
- X
- Other Apps
Beautiful soup
Uses of Beautiful Soup:
Beautiful Soup is a Python library used for web scraping — the extraction of data from websites (HTML and XML). Using a parser (explained later) of your choice, the developer can pull the code they want based off of IDs, tags, selectors, and classes (HTML code references). Do not mix this library up with Selenium if you already know it. Beautiful Soup cannot be used to interact with websites, only to pull information out of it.
Installation and Initialization:
To install Beautiful Soup, you with need the PIP package. To see how to install PIP, check out https://pip.pypa.io/en/stable/installation/. You might already have pip installed, so before you install it again on accident, try installing Beautiful Soup and see what happens. The following code snippet shows you how to install Beautiful Soup in a terminal.
![]() |
Beautiful Soup Installation |
Wait for Beautiful Soup to install and then read the rest of this post. To use Beautiful Soup, you need to import it and create the "soup". Take a look at the snippet below to see the code to create your soup and a few examples to get you started.
The snippet shows a bunch of code to create your soup so let's go over them line by line.
Line 2 and 3: Importing Beautiful Soup and requests (library for HTML requests).
Line 6: Request the website the developer wants to scrape via the URL.
Line 7: OPTIONAL, saves all the HTML code of that website page into a variable.
Line 9: Create the soup. Initialize the BeautifulSoup() class with two parameters. First is the HTML code for the entire website page (value of the variable from line 7). Second is the parser, which can be "html.parser" or "lxml". If "html.parser" gives an error, use the "lxml" parser. If you still get an error for the parser, you might need to import lxml at the top of the document. That could require you to download the library lxml in your terminal with the following code: pip install lxml.
That is all the code needed to create your soup and you're ready to start scraping. The image above still has more lines of code and those are examples for you to get started.
Line 11: Returns the title of the website (<title>Hello There</title>), tag, and element both.
Line 13: Returns the first <p></p> tag found in the content, tag, and element both.
Line 15: Returns a list including all the <p></p> tags found in the content, tag, and element both.
Line 17: Returns a list including all the <p></p> tags found in the content with the class name download_buttons, tag, and element both.
Line 19: Returns the first <p></p> tag found in the content, tag, and element both.
Use the knowledge you learned to scrape websites. But be sure to check the pages they allow you to scrape. You can check it by going to the root of the URL and putting /robots.txt. Check out my other posts for more python tutorials!
Comments
Post a Comment