High-Bandwidth Proxies for Images Download

Customize 1-100Gbps+ bandwidth
img2dataset / gallery-dl one-line parameter adaptation
Support Flickr, Shutterstock and more

Typical Usage Scenes

img2dataset

Use img2dataset to batch download billions of images from URL lists to build visual/multimodal training corpus.

gallery-dl / aria2

Batch pull original and multi-size images from Flickr, Shutterstock, community sites via gallery-dl / aria2.

E-commerce Images

Collect e-commerce product images, user uploaded images, etc. for multimodal large model and recommendation system training.

Multi-region Coverage

Multi-region IP (US/EU/JP/BR…) coverage, building multi-language, multi-region, multi-scene image datasets.

High-Bandwidth proxy for image download

High Bandwidth

Dedicated 1Gbps to 200Gbps+ (Customizable)

Pricing

Fixed pricing by bandwidth, not by traffic, predictable cost

High-Bandwidth Proxy

123Proxy provides high bandwidth proxy pool service specifically for AI training data collection: fixed bandwidth billing (1Gbps–100Gbps+), unlimited total traffic, unlimited concurrent requests.

1-200Gbps+ dedicated
Unlimited concurrency requests
Price per bandwidth ( per Gbps)
Contact Sales
Service Guarantee

Target site bot automatic monitoring, ensuring target website is not blocked

Cost Effective

Ultra-high cost performance for large scale data scraping.

Flickr Integration Example


# Example: Use 123Proxy High Bandwidth Proxy IP to run img2dataset
export http_proxy="http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000"
export https_proxy="$http_proxy"

img2dataset \
  --url_list urls.txt \
  --output_format webdataset \
  --input_format txt

Assign different sessionIds (automatic IP rotation) for each machine/task. By appending _sessionId after the username, you can bind different Sessions to different tasks, thereby using different exit IPs.

FAQs

Q1: flickr / shutterstock download large amount of 403 / 429 / timeout, what to do?

Solution:
- Enable 123Proxy high bandwidth dedicated proxy, use large-scale IP pool to automatically disperse request pressure.
- Split tasks by different `sessionId` to distribute concurrency to multiple proxy links.

Q2: Found many corrupted images or placeholders after collection?

Some sites return placeholders or blank images for abnormal IPs, recommended:
- Use 123Proxy automatic IP rotation to reduce continuous hitting of anti-crawling logic.
- Add image integrity check in post-processing stage to automatically remove corrupted files.

Q3: How much download volume can 1Gbps, 10Gbps, 100Gbps carry in image scenarios?

- 1Gbps: Suitable for tens of millions of image datasets, dozens of concurrent tasks running stably.
- 10Gbps: Suitable for hundreds of millions of image collection tasks, multi-machine cluster parallel running.
- 100Gbps: Suitable for long-term continuous massive image crawling and multi-team sharing.

Q4: Download speed fluctuates high and low, what to do?

- Appropriately increase concurrency (process count/thread count), use more connections to fill bandwidth.
- Separate metadata requests and image file downloads to different proxy exits to avoid interference.
- If still unstable, contact 123Proxy technical support to check link and target site status.

Q5: Is collecting images compliant?

Using proxy itself is legal, but collecting, saving, and using image data should comply with the terms of use and copyright laws of the target site. Please confirm the authorization scope according to specific business scenarios, and use data to train models only under legal authorization or compliance premises.