Home Machine Learning Web scraping using python

Web scraping using python

by technova
0 comments

Web scraping with Python is a way to extract data from websites. It’s commonly done using libraries like BeautifulSoup, requests, and Selenium (for dynamic content). Here’s a no-fluff overview to get you started:


🛠️ Basic Tools You’ll Need

  1. requests – to fetch the web page.
  2. BeautifulSoup – to parse HTML and extract data.
  3. pandas (optional) – to structure and store the data.
  4. Selenium – when you need to interact with JavaScript-heavy sites.

âś… Example: Scrape Quotes from http://quotes.toscrape.com

import requests
from bs4 import BeautifulSoup

url = "http://quotes.toscrape.com"
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

quotes = soup.find_all('div', class_='quote')

for quote in quotes:
    text = quote.find('span', class_='text').get_text()
    author = quote.find('small', class_='author').get_text()
    print(f"{text} - {author}")

đź§  Key Concepts

  • response.text gives the raw HTML.
  • BeautifulSoup(html, 'html.parser') parses it.
  • .find() / .find_all() helps locate HTML elements.
  • .get_text() extracts readable content.

⚠️ Tips & Ethics

  • Always check the site’s robots.txt (e.g., example.com/robots.txt) to see what’s allowed.
  • Don’t overload servers – be polite with delays (time.sleep()).
  • Use headers to mimic a browser:
    headers = {'User-Agent': 'Mozilla/5.0'}
    requests.get(url, headers=headers)
    

đź§Ş If You Need to Interact (JavaScript-Driven Sites)

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("http://quotes.toscrape.com/js")

quotes = driver.find_elements(By.CLASS_NAME, "quote")

for quote in quotes:
    print(quote.text)

driver.quit()

Want to scrape a specific site or need help structuring the data? Drop the URL or your goal, and I’ll guide you.

You may also like

Welcome to Technova Pulse – Your Gateway to Technology & Innovation

At Technova Pulse, we dive into the fast-moving world of technology and innovation.

Subscribe

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

© 2010-2025 Mahasun.site. All rights reserved.