Video Scraping API for AI Datasets

Turn YouTube, TikTok, Vimeo into structured training data. Direct S3 delivery, metadata extraction, and audio processing at scale.

Batch Download Video/Audio to S3
Fetch Rich Metadata (Views, Likes, Comments)
Audio Extraction & Transcoding
Support YouTube, TikTok, Vimeo & 1000+ sites

Satisfy All Video Data Needs with One API

Powered by yt-dlp

Leverage the power of yt-dlp/ffmpeg without managing infra. We handle proxies, updates, and scaling.

Metadata Extraction

Get rich JSON metadata including title, description, views, likes, comments, and subtitles.

Direct S3 Delivery

We push files directly to your S3/GCS bucket. Save bandwidth and simplify your pipeline.

Scalable Infrastructure

PB-scale downloader clusters capable of handling millions of video jobs concurrently.

Easy Integration (Python / Curl)


# Example: Batch Download to S3
curl -X POST "https://api.123proxy.com/v1/video/extract" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "output_format": "mp4",
    "s3_config": {
        "bucket": "my-training-data",
        "region": "us-east-1",
        "access_key": "YOUR_KEY",
        "secret_key": "YOUR_SECRET"
    },
    "fetch_metadata": true
  }'

The API asynchronously processes the video, handles all network issues, and uploads the final clean file and metadata JSON to your bucket. You receive a webhook upon completion.

Why Build on Our Video Scraping API?

High Cleanliness

Built-in cleaning filters for noise free data.

Pricing

Pay-as-you-go per successful request.

Developer First

Comprehensive documentation, SDKs for Python/Node.js, and webhooks for integration.

99.99% Uptime SLA
Sandboxed Testing Environment
Real-time Dashboard Metrics
View API Docs
Success Rate

Industry leading 99.2% success rate on difficult targets.

Cost Effective

Only pay for valid 200 OK responses.

Technical Specifications

Built for high-scale AI data ingestion.

Feature Specification
Supported Platforms YouTube, TikTok, Vimeo, Instagram, Twitter, Bilibili + 1000 others
Output Formats MP4, WEBM, MKV, MP3, WAV, M4A, JSON (Metadata)
Delivery Methods Direct S3 Upload (AWS), GCS, Webhook Callback
Metadata Fields Title, Description, Views, Likes, User Info, Comments, Subtitles (VTT/SRT)
Concurrency Unlimited concurrent async jobs (auto-scaling)
Proxy Network 72M+ Residential IPs, ISP Proxies for 99.9% Success Rate

FAQs

Does the API support YouTube Shorts and Playlists?

Yes. You can submit a playlist URL or a shorts URL. For playlists, we can expand it and schedule jobs for each video automatically.

Can I extract Audio (MP3/WAV) from videos?

You can specify `output_format: "mp3"` or `"wav"` in your request. We will extract the audio track and convert it to your desired format before uploading to S3.

Which video platforms does the API support?

We support any platform supported by yt-dlp, which includes YouTube, TikTok, Vimeo, Bilibili, Instagram, Twitter, and over a thousand others.

Can I get metadata without downloading the video?

Yes. You can set `metadata_only: true` to fetch only the JSON metadata details like views, likes, description, tags, and subtitles without processing the media file.

What are the API rate limits and concurrency?

We offer high concurrency limits. Since jobs are asynchronous, you can submit thousands of jobs at once, and we process them in parallel using our massive proxy network.