#

web-crawling

Here are 182 public repositories matching this topic...

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify parsel playwright

Updated Feb 24, 2026
Python

botasaurus

omkarcloud / botasaurus

The All in One Framework to Build Undefeatable Scrapers

Updated Feb 21, 2026
Python

cxcscmu / Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

crawler web-crawler crawling web-crawling pre-training pretraining large-language-models llm

Updated Feb 24, 2025
Python

scrapehero-code / amazon-scraper

A simple web scraper to extract Product Data and Pricing from Amazon

web-scraping web-crawling page-scraper web-scraping-tutorials amazon-scraper scrape-products

Updated Jun 13, 2023
Python

spyboy-productions / omnisci3nt

Omnisci3nt is an open-source web reconnaissance and intelligence tool for extracting deep technical insights from domains, including subdomains, SSL certificates, exposed services, archived content, and configuration data. — Omnisci3nt gives you the full picture in seconds.

Updated Jan 6, 2026
Python

scrapinghub / scrapy-training

Scrapy Training companion code

python training web-scraping scrapy web-crawling

Updated Jan 30, 2019
Python

MaxValue / Terpene-Profile-Parser-for-Cannabis-Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Updated Apr 28, 2023
Python

leogregianin / bancocentralbrasil

💵 💰 🇧🇷 Informações sobre taxas oficiais diárias de Inflação, Selic, Poupança, Dólar, Dólar PTAX, Euro e Euro PTAX pelo site do Banco Central do Brasil

money brazil web-scraping brasil web-crawling banco-central

Updated Nov 30, 2021
Python

my8100 / scrapyd-cluster-on-heroku

Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉

python heroku cluster web-scraping scrapy web-crawling scrapyd scrapydweb logparser

Updated Apr 4, 2020
Python

alyakhtar / Katastrophe

Command Line Tool to download torrents

python screenshot torrent bittorrent command-line kickass-torrents deluge web-crawling

Updated Feb 3, 2017
Python

spyboy-productions / PhantomCrawler

PhantomCrawler is a Python-based web testing and research tool that simulates website interactions from multiple proxy IP addresses to analyze traffic behavior, access controls, and response patterns under different network conditions.

proxy proxy-configuration web-crawling web-scrapping website-analytics ddos-attack-tools proxy-rotation website-hits

Updated Jan 6, 2026
Python

sushantPatrikar / Amazon-Flipkart-Price-Comparison-Engine

Compares price of the product entered by the user from e-commerce sites Amazon and Flipkart 💰 📊

python amazon python3 tkinter python-3 web-crawling flipkart web-crawler-python ecommerce-sites-amazon corresponding-prices

Updated Dec 8, 2022
Python

GoTrained / Scrapy-Craigslist

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

python scrapy-spider web-scraper craigslist web-scraping scrapy web-crawling scrapy-crawler scrapy-tutorial

Updated Aug 5, 2017
Python

dongweiming / daenerys

Scraping and Web Crawling Framework For Zhihu Live

scraping zhihu web-crawling zhihulive

Updated Oct 10, 2017
Python

MohamedHmini / tweetsOLAPing

implementing an end-to-end tweets ETL/Analysis pipeline.

tweets analysis twitter-api multithreading api-client datawarehousing datawarehouse web-crawling ssis google-api-client etl-pipeline tweets-classification cube-analysis powerbi-report ssas-multidimensional multi-dimensional-analysis tweets-scraper

Updated Dec 8, 2022
Python

sadiuysal / crawl4ai-mcp-server

🕷️ A lightweight Model Context Protocol (MCP) server that exposes Crawl4AI web scraping and crawling capabilities as tools for AI agents. Similar to Firecrawl's API but self-hosted and free. Perfect for integrating web scraping into your AI workflows with OpenAI Agents SDK, Cursor, Claude Code, and other MCP-compatible tools.

mcp web-scraping cursor web-crawling ai-agents crawl4ai model-context-protocol claude-code openai-agents openai-agents-sdk firecrawl-alternative

Updated Feb 6, 2026
Python

mike-gee / webtranspose

Web scraping API for building AI applications.

python scraping crawling web-scraping chatbots web-crawling scraping-python crawling-python web-scraping-python

Updated Jan 24, 2024
Python

ScrapingAnt / zoominfo_scraper

Zoominfo scraper with using of rotating proxies and headless Chrome from ScrapingAnt

python scraper web-crawler scraping scraping-websites web-crawling datamining zoominfo-client web-crawler-python leadgen leadgeneration scraping-api scraping-tool scraping-data web-harvesting

Updated Apr 26, 2021
Python

HRN-Projects / amazon-captcha-solver

A TensorFlow (Deep Learning - CNN) based solution for tackling captcha when collecting data from Amazon.

python api tensorflow captcha keras python3 web-scraping flask-api web-crawling captcha-solving captcha-solver open-cv captcha-images amazon-captcha hrn-projects web-scraping-solution amazon-captcha-solver amazon-captcha-solving captcha-solver-api

Updated Apr 8, 2025
Python

kapilkchaurasia / Data-mining-python-script

It contain various script on web crawling/ data mining of social web(RSS,facebook,twitter,Linkedin)

python rss data-mining facebook twitter linkedin web-crawling

Updated Sep 19, 2014
Python

Improve this page

Add a description, image, and links to the web-crawling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawling topic, visit your repo's landing page and select "manage topics."