feat: v2 SDK rewrite with Pydantic + httpx#84
Open
FrancescoSaverioZuppichini wants to merge 24 commits intomainfrom
Open
feat: v2 SDK rewrite with Pydantic + httpx#84FrancescoSaverioZuppichini wants to merge 24 commits intomainfrom
FrancescoSaverioZuppichini wants to merge 24 commits intomainfrom
Conversation
- Delete .agent/ documentation folder (unused) - Simplify CLAUDE.md from 370 to ~90 lines - Remove stale docs (HEALTHCHECK.md, IMPLEMENTATION_SUMMARY.md, TOON_INTEGRATION_SUMMARY.md) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
BREAKING CHANGE: Complete project restructure - Remove nested scrapegraph-py/ folder - Initialize as uv library with src/ layout - Clean slate for v2 API rewrite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ScrapeGraphAI sync client with httpx - Add AsyncScrapeGraphAI async client - Add Pydantic models for all request/response types - Add nested resources: crawl, monitor, history - Return ApiResult wrapper (never raises) - Support SGAI_API_KEY, SGAI_DEBUG, SGAI_TIMEOUT_S env vars API surface: - client.scrape(ScrapeRequest) - client.extract(ExtractRequest) - client.search(SearchRequest) - client.credits() - client.health() - client.crawl.start/get/stop/resume/delete - client.monitor.create/list/get/update/delete/pause/resume - client.history.list/get Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- scrape: basic, json extraction, pdf, multi-format, fetchconfig - extract: basic, with schema - search: basic, with extraction - crawl: basic, with formats - monitor: basic, with webhook - utilities: credits, health, history Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete types.py, everything in schemas.py - Remove Api prefix from response models - Pre-compile server timing regex - Fix json field shadowing with aliases Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Follows Pydantic v2 best practices for type safety Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test credits, scrape, extract, search, history, crawl - Fix HttpUrl serialization (mode="json" in model_dump) - Add python-dotenv for loading .env Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace manual _to_camel with Pydantic's built-in alias_generator - CamelModel base class handles snake_case -> camelCase conversion - Simplify _serialize to single model_dump call - Add async versions of all 16 examples - Update README with expanded async client docs and examples table - Add banner from JS SDK Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add .pytest_cache/, .ruff_cache/, .mypy_cache/ to gitignore - Add common Python build/test artifacts - Remove obsolete update-requirements.yml workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove obsolete pylint.yml and test.yml (referenced old structure) - Add ci.yml with simple lint + test jobs using uv - Update release.yml for root-level project - Update python-publish.yml for uv build Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test request construction, response parsing, error handling - Mock httpx.Client.request instead of hitting real API - Test all endpoints: scrape, extract, search, crawl, monitor, history - Test HTTP errors (401, 402, 429), timeouts - Test camelCase serialization - Update CI to run test_client.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dependency ReviewThe following issues were found:
Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. License Issuesuv.lock
OpenSSF ScorecardScorecard details
Scanned Files
|
- Run ruff format on src/ - Add ruff config to pyproject.toml (line-length=100, ignore E501) - Fix import ordering Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Format test files with ruff - Add per-file ignores for tests (F841, E402) - Update CI to check src/ tests/ only Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ResponseModel base class with camelCase alias generator - Change all response models to inherit from ResponseModel - Use TypeAdapter for proper generic type parsing - Update all examples to use attribute access (res.data.results) - Fix all test mocks with complete required fields This follows industry standard SDK patterns where typed objects are returned for IDE autocompletion and type safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test minimum supported version and latest stable. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Convert dict-style access to Pydantic attribute access in all examples - Add polling loop to crawl examples (matches JS SDK behavior) - Add dotenv loading to all examples for easier local testing - Fix health endpoint to use /health instead of /healthz - Update CLAUDE.md with pre-commit checklist using ruff Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Default URL: https://api.scrapegraphai.com/api/v2 - Env var: SGAI_TIMEOUT_S -> SGAI_TIMEOUT Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
VinciGit00
added a commit
to ScrapeGraphAI/scrapegraph-mcp
that referenced
this pull request
Apr 15, 2026
Rebase base URL, env vars, and auth header onto the new scrapegraph-py v2 SDK contract (ScrapeGraphAI/scrapegraph-py#84): - Base URL: /api/v2 -> /v2 (default https://api.scrapegraphai.com/v2) - Env: SGAI_API_URL (SCRAPEGRAPH_API_BASE_URL kept as legacy alias) - Env: SGAI_TIMEOUT_S for httpx timeout (default 120s) - Drop Authorization: Bearer; keep SGAI-APIKEY only (matches SDK) - Update docstrings, resources, README, server.json, .agent docs to reference #84 and the /v2 base URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MonitorActivityRequest, MonitorActivityResponse, MonitorTickEntry schemas - Add activity() method to MonitorResource (sync and async) - Update monitor examples to use activity() and show diffs nicely - Delete monitor on Ctrl+C cleanup Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add seen_ids deduplication - Cleanup in signal handler directly - Show "(no diffs data)" when changed but no diffs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Complete SDK rewrite matching the JS SDK 1:1:
status: "success" | "error"sgai.crawl.start(),sgai.monitor.create(),sgai.history.list()src/layoutChanges
scrapegraph-py/to root-level uv libraryschemas.pywithCamelModelbase classScrapeGraphAI) and async client (AsyncScrapeGraphAI)API Surface
Test plan
uv run pytest tests/test_client.py -v)uv run ruff check .)🤖 Generated with Claude Code