High-Bandwidth Proxies for Code Download

Customize 1-100Gbps+ bandwidth
git / GitHub API one-click proxy adaptation
Support GitHub, GitLab and more

Typical Usage Scenes

git clone

Batch git clone popular repositories with Stars above a certain threshold to build Code LLM training corpus.

GitHub API

Use GitHub REST / GraphQL API to crawl Issue, PR, Commit history and other metadata.

Repo Sync

Regularly synchronize repository updates to build a continuously updated code knowledge base.

Enterprise Mirror

Provide dedicated line static proxy for enterprise GitHub / GitHub Enterprise for mirroring and backup synchronization.

High-Bandwidth proxy for code data download

High Bandwidth

Dedicated 1Gbps to 200Gbps+ (Customizable)

Pricing

Fixed pricing by bandwidth, not by traffic, predictable cost

High-Bandwidth Proxy

123Proxy provides high bandwidth proxy pool service specifically for AI training data collection: fixed bandwidth billing (1Gbps–100Gbps+), unlimited total traffic, unlimited concurrent requests.

1-200Gbps+ dedicated
Unlimited concurrency requests
Price per bandwidth ( per Gbps)
Contact Sales
Service Guarantee

Target site bot automatic monitoring, ensuring target website is not blocked

Cost Effective

Ultra-high cost performance for large scale data scraping.

git / GitHub API Integration Example


# Example 1: Use 123Proxy High Bandwidth Proxy IP for git clone
export https_proxy="http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000"
export http_proxy="$https_proxy"

git clone https://github.com/owner/repo.git

Different sessionIds can be bound to different exit IPs for concurrent cloning of large numbers of repositories.


# Example 2: Call GitHub API via 123Proxy
import requests

proxies = {
    "http": "http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000",
    "https": "http://USERNAME_sessionId_time:PASSWORD@gateway.123proxy.cn:31000",
}

headers = {
    "Authorization": "Bearer YOUR_GITHUB_TOKEN",
    "Accept": "application/vnd.github+json",
}

resp = requests.get(
    "https://api.github.com/repos/owner/repo",
    proxies=proxies,
    headers=headers,
    timeout=30,
)
print(resp.status_code, resp.json().get("full_name"))

FAQs

Q1: git clone failed, prompting RPC failed / curl 56 / EOF / SSL error?

Solution:
- Enable 123Proxy high bandwidth dedicated proxy to ensure link quality to GitHub.
- Appropriately reduce `http.postBuffer` and add retry logic for large repositories.

Q2: GitHub API prompts rate limit exceeded?

For large scale API calls:
- Must carry Token to call interface, and reasonably divide multiple Tokens.
- Use different `sessionId` / exit IP to share requests, avoid concentrating on a single IP.

Q3: What difference does 1Gbps, 10Gbps, 100Gbps bring in git clone scenario?

- 1Gbps: Suitable for small and medium scale collection tasks with dozens to hundreds of concurrent clones.
- 10Gbps: Support hundreds to thousands of concurrent clones, used for TB level dataset construction.
- 100Gbps: Suitable for continuous full synchronization, enterprise-level Code LLM training projects.

Q4: How to avoid frequent interruption halfway through clone?

- Use stable lines and high bandwidth proxies to reduce disconnection caused by network jitter.
- For super large repositories, split tasks, multiple incremental clones instead of one-time brute force pull.
- Add failure retry and breakpoint resume logic in the task scheduling layer.

Q5: Is collecting Github code compliant?

Open source code has their own open source licenses (MIT, Apache, GPL, etc.), should comply with corresponding licenses when used for training and commercial use. 123Proxy only provides network channels and does not participate in data usage; please consult the legal team according to your own business to ensure that collection and use are legal and compliant.