🔥 The Web Data API for AI - Power AI agents with clean web data
-
Updated
Apr 11, 2026 - TypeScript
🔥 The Web Data API for AI - Power AI agents with clean web data
Python scraper based on AI
Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.
Use LLMs to robustly extract web data
Give your AI the power to browse, scrape, and extract structured data from complex websites — with faster execution, lower cost, and more reliable results.
Web Data Scraper - no-code internet scraping. Extract and export to CSV, Excel, JSON, Google Sheets, and Webhook.
AI based web-wrapper for web-content-extraction
qcrawl - fast async web crawling & scraping framework for Python.
The this.url class is designed to fetch and parse URL data, returning an object with structured information that can then be used for machine learning algorithms in a database or other storage.
Quick guide with code example how to use Java for web scraping
GNewsScraper is a TypeScript package that scrapes article data from Google News based on a keyword or phrase. It returns the results as an array of JSON objects, making it convenient to access and use the scraped information
An API wrapper for Scrappey.com written in Node.js (cloudflare bypass & solver)
AI-based web extractor
High-performance HTML to Markdown converter with full GitHub Flavored Markdown support. Written in Rust, available for Node.js and as a native Rust crate.
A pipeline to scrape, extract, and analyze book data from web pages to insights.
Java Framework which is used by the Web Data Commons project to extract Microdata, Microformats and RDFa data, Web graphs, and HTML tables from the web crawls provided by the Common Crawl Foundation.
Open-source web crawler
HTML to Markdown with CSS selector and XPath annotations
The Tableau Web Data Connector for Facebook Insights API
RealShotPDF is a Chrome extension designed to simplify the process of creating PDF documents from web content. The extension allows users to navigate through selected webpages, parse and display links in a tree view, and generate PDFs for the chosen pages. It operates locally without sending any data to external servers.
Add a description, image, and links to the web-data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the web-data-extraction topic, visit your repo's landing page and select "manage topics."