High-Bandwidth Proxies for Code Download

Customize 1-100Gbps+ bandwidth
git / GitHub API one-click proxy adaptation
Support GitHub, GitLab and more

Typical Usage Scenes

git clone

Batch git clone popular repositories with Stars above a certain threshold to build Code LLM training corpus.

GitHub API

Use GitHub REST / GraphQL API to crawl Issue, PR, Commit history and other metadata.

Repo Sync

Regularly synchronize repository updates to build a continuously updated code knowledge base.

Enterprise Mirror

Provide dedicated line static proxy for enterprise GitHub / GitHub Enterprise for mirroring and backup synchronization.

High-Bandwidth proxy for code data download

High Bandwidth

Dedicated 1Gbps to 200Gbps+ (Customizable)

Pricing

Fixed pricing by bandwidth, not by traffic, predictable cost

High-Bandwidth Proxy

123Proxy provides high bandwidth proxy pool service specifically for AI training data collection: fixed bandwidth billing (1Gbps�00Gbps+), unlimited total traffic, unlimited concurrent requests.

1-200Gbps+ dedicated
Unlimited concurrency requests
Price per bandwidth ( per Gbps)
Contact Sales
Service Guarantee

Target site bot automatic monitoring, ensuring target website is not blocked

Cost Effective

Ultra-high cost performance for large scale data scraping.

git / GitHub API Integration Example


# Example 1: Use 123Proxy High Bandwidth Proxy IP for git clone
export https_proxy="http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000"
export http_proxy="$https_proxy"

git clone https://github.com/owner/repo.git

Different sessionIds can be bound to different exit IPs for concurrent cloning of large numbers of repositories.


# Example 2: Call GitHub API via 123Proxy
import requests

proxies = {
    "http": "http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000",
    "https": "http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000",
}

headers = {
    "Authorization": "Bearer YOUR_GITHUB_TOKEN",
    "Accept": "application/vnd.github+json",
}

resp = requests.get(
    "https://api.github.com/repos/owner/repo",
    proxies=proxies,
    headers=headers,
    timeout=30,
)
print(resp.status_code, resp.json().get("full_name"))

Technical Specifications

Optimized for large-scale code repository ingestion.

Category Specification
Supported Tools git, gh cli, curl, wget, rsync
Target Platforms GitHub, GitLab, Bitbucket, SourceForge, Hugging Face
Bandwidth 1Gbps - 100Gbps+ Dedicated
Features Token Rotation Support, Enterprise Mirroring, Sticky Sessions
Protocols HTTP(S), SOCKS5 (via proxychains)

FAQs

How to fix git clone RPC failed / curl 56 / EOF errors?

Solution:
- Enable 123Proxy high bandwidth dedicated proxy to ensure link quality to GitHub.
- Appropriately reduce `http.postBuffer` and add retry logic for large repositories.

How to bypass GitHub API rate limit exceeded?

For large scale API calls:
- Must carry Token to call interface, and reasonably divide multiple Tokens.
- Use different `sessionId` / exit IP to share requests, avoid concentrating on a single IP.

What is the typical throughput for 1Gbps vs 10Gbps?

- 1Gbps: Suitable for small and medium scale collection tasks with dozens to hundreds of concurrent clones.
- 10Gbps: Support hundreds to thousands of concurrent clones, used for TB level dataset construction.
- 100Gbps: Suitable for continuous full synchronization, enterprise-level Code LLM training projects.

How to prevent git clone connection interruptions?

- Use stable lines and high bandwidth proxies to reduce disconnection caused by network jitter.
- For super large repositories, split tasks, multiple incremental clones instead of one-time brute force pull.
- Add failure retry and breakpoint resume logic in the task scheduling layer.

Is scraping GitHub public repositories compliant?

Open source code has their own open source licenses (MIT, Apache, GPL, etc.), should comply with corresponding licenses when used for training and commercial use. 123Proxy only provides network channels and does not participate in data usage; please consult the legal team according to your own business to ensure that collection and use are legal and compliant.