web-data-extraction

Give your AI the power to browse, scrape, and extract structured data from complex websites — with faster execution, lower cost, and more reliable results.

automation web-scraping data-extraction cursor no-code web-data-extraction ai-agents web-scraping-api claude-code claude-code-skills openclaw openclaw-skills

Updated Apr 10, 2026
Python

umbrellaDocumentation / Web-Data-Scraper

Star

Web Data Scraper - no-code internet scraping. Extract and export to CSV, Excel, JSON, Google Sheets, and Webhook.

Updated Mar 17, 2026
JavaScript

MohamedHmini / iww

Star

AI based web-wrapper for web-content-extraction

python data-mining library ai information-extraction web-scraping web-mining web-content-extractor web-data-extraction

Updated Feb 6, 2023
Python

crawlcore / qcrawl

Star

qcrawl - fast async web crawling & scraping framework for Python.

python crawler framework async scraping crawling web-scraping asyncio web-data-extraction web-scraping-python camoufox

Updated Dec 7, 2025
Python

neurons-me / this.url

Star

The this.url class is designed to fetch and parse URL data, returning an object with structured information that can then be used for machine learning algorithms in a database or other storage.

web-scraping url-parsing metadata-extraction web-data-extraction neurons-me-ecosystem structured-url-data machine-learning-urls data-driven-web-analysis intelligent-link-processing ai-ready-url-processing

Updated Aug 26, 2025
JavaScript

luminati-io / java-web-scraping

Star

Quick guide with code example how to use Java for web scraping

java maven scraping-websites web-data-extraction

Updated Dec 18, 2024

GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information

typescript web-scraping json-parsing web-crawling google-news data-scraping google-news-scraper web-data-extraction web-automation keyword-search gnews news-scraping gnews-api article-extraction gnews-scraper

Updated Aug 19, 2023
TypeScript

DemonMartin / scrappey-wrapper

Star

An API wrapper for Scrappey.com written in Node.js (cloudflare bypass & solver)

web-scraping data-extraction web-data-extraction scraping-framework scraping-tool cloudflare-bypass web-scraping-solution cloudflare-solver api-scraping scraping-solution website-data-extraction scraping-library cloudflare-anti-bot scraping-service data-scraping-tool website-scraping-tool turnstile-solver

Updated Jan 10, 2024
JavaScript

jjonescz / awe

Sponsor

Star

AI-based web extractor

deep-learning information-extraction web-scraping web-data-extraction structured-web-data

Updated Feb 25, 2023
Python

vakra-dev / supermarkdown

Star

High-performance HTML to Markdown converter with full GitHub Flavored Markdown support. Written in Rust, available for Node.js and as a native Rust crate.

html markdown rust converter github-flavored-markdown ai html-to-markdown wasm commonmark web-scraping html-parser data-extraction text-processing content-extraction web-data-extraction ai-agents llm

Updated Jan 29, 2026
Rust

SaurabhSSB / BookMiner

Star

A pipeline to scrape, extract, and analyze book data from web pages to insights.

python books jupyter-notebook eda data-visualization web-scraping data-analysis html-parsing beautifulsoup csv-export data-pipeline web-data-extraction data-science-project project-portfolio book-dataset

Updated Jan 1, 2026
HTML

wbsg-uni-mannheim / WDCFramework

Star

Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.

schema-org json-ld microdata web-data-extraction

Updated Dec 13, 2022
Java

Boomslet / Web_Crawler

Star

Open-source web crawler

python url html open-source website opensource links web-crawler urls free data-extraction webcrawler web-crawling web-data-extraction urllib web-crawler-python

Updated Jul 21, 2018
Python

lightfeed / scrapedown

Star

HTML to Markdown with CSS selector and XPath annotations

nlp markdown crawler html-to-markdown html-parser webscraping data-pipeline web-data-extraction llm llm-scraper llm-extraction

Updated Apr 7, 2026
TypeScript

kaizenplatform / FacebookInsightsConnector

Star

The Tableau Web Data Connector for Facebook Insights API

facebook tableau facebook-insights web-data-extraction

Updated Jun 26, 2017
JavaScript

lekhmanrus / real-shot-pdf

Star

RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.

Updated Mar 1, 2024
TypeScript

Improve this page

Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

web-data-extraction

Here are 54 public repositories matching this topic...

firecrawl / firecrawl

ScrapeGraphAI / Scrapegraph-ai

vakra-dev / reader

lightfeed / extractor

browser-act / skills

umbrellaDocumentation / Web-Data-Scraper

MohamedHmini / iww

crawlcore / qcrawl

neurons-me / this.url

luminati-io / java-web-scraping

dstark5 / gnews-scraper

DemonMartin / scrappey-wrapper

jjonescz / awe

vakra-dev / supermarkdown

SaurabhSSB / BookMiner

wbsg-uni-mannheim / WDCFramework

Boomslet / Web_Crawler

lightfeed / scrapedown

kaizenplatform / FacebookInsightsConnector

lekhmanrus / real-shot-pdf

Improve this page

Add this topic to your repo