About Open Source Software Mirror Alliance (Technical Part)

Please read first: About Open Source Software Mirror Alliance (Non-Technical Part), thank you

I. DNS or 301?

Update: I previously misunderstood DNS’s CNAME, now corrected. Reference: https://tools.ietf.org/html/rfc3568

DNS solution (if I’m wrong, feel free to correct me):

Assign a secondary domain name to each mirror for separate scheduling; set NS record to the main site
User queries IP from ISP DNS
The main site returns one or more IP addresses of mirror nodes based on the source IP (ISP DNS’s IP)
ISP DNS returns this IP to the user, and caches it for a certain period
User establishes a connection with the mirror node and downloads
301 solution:
Domain A record resolves to the main site
User establishes a connection with the main site and initiates an HTTP request
The main site returns HTTP 301 status based on the user’s source IP, redirecting to a mirror node
User establishes a connection with this mirror node and downloads

Comparison	DNS	301
Scheduling accuracy	Can only know the IP of the ISP DNS, match by region	Can know the user's IP, match accurately based on the user's IP
Modification of scheduling strategy	Need to wait for DNS cache to timeout to take effect	Can be modified at any time, takes effect immediately
Mirror node directory structure	Must be the same	Does not have to be the same
User needs to access outside the school	Not necessary	Necessary(*)
Access delay	Almost no increase in delay	Need to establish a connection with the main site first, adding the delay of one TCP handshake
Main site pressure	ISP DNS has cache, low pressure	Every visit has to go to the main site, and the cost of parsing HTTP requests is higher than DNS, so the pressure is high
Main site stability requirements	High	Relatively high
Load balancing of multiple main sites	Can be implemented	Can be implemented

(*) Even if there are mirror nodes within the school, users must still open access to outside the school (Are there many schools that limit IPv6? If most schools' IPv6 has no access restrictions, ignore this item)

II. Communication between mirrors

The main node needs to monitor the status of each mirror node in real time, for example using Ganglia/Nagios; when a fault is detected, on one hand, the scheduling strategy needs to be adjusted so that new requests no longer go to this mirror; on the other hand, an alert needs to be sent out by email.

A mainland main node of a mirror regularly synchronizes from the upstream source. After synchronization is completed, it notifies other nodes to synchronize from itself (this API needs to be discussed); if synchronization fails, it also notifies other nodes to synchronize from the upstream source.

The access logs of the mirrors have analytical value and are an important basis for adjusting the scheduling strategy. Therefore, there needs to be a mechanism where each node sends its web access logs to the main site every day, and the main site then classifies them according to the mirror.

Each mirror site should try to negotiate to use the same Linux distribution, to facilitate the writing of the “maintenance tool chain”.

III. Information that the main site web interface needs to provide

Public mirror list, showing the size of each mirror, daily update statistics, which mirror nodes already exist, and which mirrors each node maintains, so that newly joined mirror sites can act according to their abilities, and use limited resources on the most needed mirrors.
Status monitoring of each mirror node
Help information for users using each mirror
Reference for this article: http://blog.ustc.edu.cn/pipermail/ustc_lug/2013-March/009974.html

About Open Source Software Mirror Alliance (Technical Part)

Comments