What is the typical throughput for 1Gbps vs 10Gbps?

1Gbps supports tens of millions of images with stable concurrency. 10Gbps is suitable for hundreds of millions of images, supporting multi-machine cluster parallel crawling.

High-Bandwidth Proxies for Images Download

Name: Image Data Collection Proxy
Brand: 123Proxy
Availability: InStock

Customize 1-100Gbps+ bandwidth

img2dataset / gallery-dl one-line parameter adaptation

Support Flickr, Shutterstock and more

Enjoy a Free Trial at 1 Gbps+ Speeds

Typical Usage Scenes

img2dataset

Use img2dataset to batch download billions of images from URL lists to build visual/multimodal training corpus.

gallery-dl / aria2

Batch pull original and multi-size images from Flickr, Shutterstock, community sites via gallery-dl / aria2.

E-commerce Images

Collect e-commerce product images, user uploaded images, etc. for multimodal large model and recommendation system training.

Multi-region Coverage

Multi-region IP (US/EU/JP/BR�? coverage, building multi-language, multi-region, multi-scene image datasets.

High-Bandwidth proxy for image download

High Bandwidth

Dedicated 1Gbps to 200Gbps+ (Customizable)

Pricing

Fixed pricing by bandwidth, not by traffic, predictable cost

High-Bandwidth Proxy

123Proxy provides high bandwidth proxy pool service specifically for AI training data collection: fixed bandwidth billing (1Gbps�?00Gbps+), unlimited total traffic, unlimited concurrent requests.

1-200Gbps+ dedicated

Unlimited concurrency requests

Price per bandwidth ( per Gbps)

Contact Sales

Service Guarantee

Target site bot automatic monitoring, ensuring target website is not blocked

Cost Effective

Ultra-high cost performance for large scale data scraping.

Flickr Integration Example


# Example: Use 123Proxy High Bandwidth Proxy IP to run img2dataset
export http_proxy="http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000"
export https_proxy="$http_proxy"

img2dataset \
  --url_list urls.txt \
  --output_format webdataset \
  --input_format txt

Assign different sessionIds (automatic IP rotation) for each machine/task. By appending _sessionId after the username, you can bind different Sessions to different tasks, thereby using different exit IPs.

Technical Specifications

Optimized for billion-scale image ingestion.

Category	Specification
Supported Tools	img2dataset, gallery-dl, aria2, wget
Target Platforms	Flickr, Shutterstock, Pinterest, Instagram, E-commerce
Bandwidth	1Gbps - 100Gbps+ Dedicated
Concurrency	Unlimited (Optimized for High RPS)
IP Rotation	Automatic (Session-based or Per-Request)

FAQs

How to solve Flickr / Shutterstock 403 / 429 errors?

Solution:
- Enable 123Proxy high bandwidth dedicated proxy, use large-scale IP pool to automatically disperse request pressure.
- Split tasks by different `sessionId` to distribute concurrency to multiple proxy links.

How to handle corrupted images or placeholders?

Some sites return placeholders or blank images for abnormal IPs, recommended:
- Use 123Proxy automatic IP rotation to reduce continuous hitting of anti-crawling logic.
- Add image integrity check in post-processing stage to automatically remove corrupted files.

What is the typical throughput for 1Gbps vs 1Gbps?

- 1Gbps: Suitable for tens of millions of image datasets, dozens of concurrent tasks running stably.
- 10Gbps: Suitable for hundreds of millions of image collection tasks, multi-machine cluster parallel running.
- 100Gbps: Suitable for long-term continuous massive image crawling and multi-team sharing.

How to fix unstable download speeds?

- Appropriately increase concurrency (process count/thread count), use more connections to fill bandwidth.
- Separate metadata requests and image file downloads to different proxy exits to avoid interference.
- If still unstable, contact 123Proxy technical support to check link and target site status.

Is using proxies for image scraping compliant?

Using proxy itself is legal, but collecting, saving, and using image data should comply with the terms of use and copyright laws of the target site. Please confirm the authorization scope according to specific business scenarios, and use data to train models only under legal authorization or compliance premises.