High-Bandwidth Proxies for Images Download

Customize 1-100Gbps+ bandwidth
img2dataset / gallery-dl one-line parameter adaptation
Support Flickr, Shutterstock and more

Typical Usage Scenes

img2dataset

Use img2dataset to batch download billions of images from URL lists to build visual/multimodal training corpus.

gallery-dl / aria2

Batch pull original and multi-size images from Flickr, Shutterstock, community sites via gallery-dl / aria2.

E-commerce Images

Collect e-commerce product images, user uploaded images, etc. for multimodal large model and recommendation system training.

Multi-region Coverage

Multi-region IP (US/EU/JP/BR� coverage, building multi-language, multi-region, multi-scene image datasets.

High-Bandwidth proxy for image download

High Bandwidth

Dedicated 1Gbps to 200Gbps+ (Customizable)

Pricing

Fixed pricing by bandwidth, not by traffic, predictable cost

High-Bandwidth Proxy

123Proxy provides high bandwidth proxy pool service specifically for AI training data collection: fixed bandwidth billing (1Gbps�00Gbps+), unlimited total traffic, unlimited concurrent requests.

1-200Gbps+ dedicated
Unlimited concurrency requests
Price per bandwidth ( per Gbps)
Contact Sales
Service Guarantee

Target site bot automatic monitoring, ensuring target website is not blocked

Cost Effective

Ultra-high cost performance for large scale data scraping.

Flickr Integration Example


# Example: Use 123Proxy High Bandwidth Proxy IP to run img2dataset
export http_proxy="http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000"
export https_proxy="$http_proxy"

img2dataset \
  --url_list urls.txt \
  --output_format webdataset \
  --input_format txt

Assign different sessionIds (automatic IP rotation) for each machine/task. By appending _sessionId after the username, you can bind different Sessions to different tasks, thereby using different exit IPs.

Technical Specifications

Optimized for billion-scale image ingestion.

Category Specification
Supported Tools img2dataset, gallery-dl, aria2, wget
Target Platforms Flickr, Shutterstock, Pinterest, Instagram, E-commerce
Bandwidth 1Gbps - 100Gbps+ Dedicated
Concurrency Unlimited (Optimized for High RPS)
IP Rotation Automatic (Session-based or Per-Request)

FAQs

How to solve Flickr / Shutterstock 403 / 429 errors?

Solution:
- Enable 123Proxy high bandwidth dedicated proxy, use large-scale IP pool to automatically disperse request pressure.
- Split tasks by different `sessionId` to distribute concurrency to multiple proxy links.

How to handle corrupted images or placeholders?

Some sites return placeholders or blank images for abnormal IPs, recommended:
- Use 123Proxy automatic IP rotation to reduce continuous hitting of anti-crawling logic.
- Add image integrity check in post-processing stage to automatically remove corrupted files.

What is the typical throughput for 1Gbps vs 1Gbps?

- 1Gbps: Suitable for tens of millions of image datasets, dozens of concurrent tasks running stably.
- 10Gbps: Suitable for hundreds of millions of image collection tasks, multi-machine cluster parallel running.
- 100Gbps: Suitable for long-term continuous massive image crawling and multi-team sharing.

How to fix unstable download speeds?

- Appropriately increase concurrency (process count/thread count), use more connections to fill bandwidth.
- Separate metadata requests and image file downloads to different proxy exits to avoid interference.
- If still unstable, contact 123Proxy technical support to check link and target site status.

Is using proxies for image scraping compliant?

Using proxy itself is legal, but collecting, saving, and using image data should comply with the terms of use and copyright laws of the target site. Please confirm the authorization scope according to specific business scenarios, and use data to train models only under legal authorization or compliance premises.