Please read first: About Open Source Software Mirror Alliance (Non-Technical Part), thank you

I. DNS or 301?

Update: I previously misunderstood DNS’s CNAME, now corrected. Reference: https://tools.ietf.org/html/rfc3568

DNS solution (if I’m wrong, feel free to correct me):

  1. Assign a secondary domain name to each mirror for separate scheduling; set NS record to the main site

  2. User queries IP from ISP DNS

  3. The main site returns one or more IP addresses of mirror nodes based on the source IP (ISP DNS’s IP)

  4. ISP DNS returns this IP to the user, and caches it for a certain period

  5. User establishes a connection with the mirror node and downloads
    301 solution:

  6. Domain A record resolves to the main site

  7. User establishes a connection with the main site and initiates an HTTP request

  8. The main site returns HTTP 301 status based on the user’s source IP, redirecting to a mirror node

  9. User establishes a connection with this mirror node and downloads

**Comparison** **DNS** **301**
Scheduling accuracy Can only know the IP of the ISP DNS, match by region Can know the user's IP, match accurately based on the user's IP
Modification of scheduling strategy Need to wait for DNS cache to timeout to take effect Can be modified at any time, takes effect immediately
Mirror node directory structure Must be the same Does not have to be the same
User needs to access outside the school Not necessary Necessary(*)
Access delay Almost no increase in delay Need to establish a connection with the main site first, adding the delay of one TCP handshake
Main site pressure ISP DNS has cache, low pressure Every visit has to go to the main site, and the cost of parsing HTTP requests is higher than DNS, so the pressure is high
Main site stability requirements High Relatively high
Load balancing of multiple main sites Can be implemented Can be implemented
(*) Even if there are mirror nodes within the school, users must still open access to outside the school (Are there many schools that limit IPv6? If most schools' IPv6 has no access restrictions, ignore this item)

II. Communication between mirrors

The main node needs to monitor the status of each mirror node in real time, for example using Ganglia/Nagios; when a fault is detected, on one hand, the scheduling strategy needs to be adjusted so that new requests no longer go to this mirror; on the other hand, an alert needs to be sent out by email.

A mainland main node of a mirror regularly synchronizes from the upstream source. After synchronization is completed, it notifies other nodes to synchronize from itself (this API needs to be discussed); if synchronization fails, it also notifies other nodes to synchronize from the upstream source.

The access logs of the mirrors have analytical value and are an important basis for adjusting the scheduling strategy. Therefore, there needs to be a mechanism where each node sends its web access logs to the main site every day, and the main site then classifies them according to the mirror.

Each mirror site should try to negotiate to use the same Linux distribution, to facilitate the writing of the “maintenance tool chain”.

III. Information that the main site web interface needs to provide

  • Public mirror list, showing the size of each mirror, daily update statistics, which mirror nodes already exist, and which mirrors each node maintains, so that newly joined mirror sites can act according to their abilities, and use limited resources on the most needed mirrors.
  • Status monitoring of each mirror node
  • Help information for users using each mirror
    Reference for this article: http://blog.ustc.edu.cn/pipermail/ustc_lug/2013-March/009974.html

Comments

2013-03-19