This article summarizes the technical architecture of USTC iCourse.club. Founded in 2015, iCourse is a Flask-based website aiming at rating courses in USTC (University of Science and Technology of China). Although it is only a small website from a technical point of view, it offers a glimpse of the architecture of a typical web service.

Server Deployment Architecture

The First Days

Initially, iCourse is a standalone service that runs on a server of USTC library, which co-locates with moocwiki.org (now closed). The architecture at that time is as simple as the following figure.

  • HTTP requests come from users’ browsers and go through Nginx reverse proxy. Static files (e.g., user uploaded files, avatars, favicon, CSS, and JavaScript) are served directly by Nginx. Dynamic content is served by the Flask web service, and the Nginx reverse proxy forwards the HTTP requests to the Flask web service.
  • The Flask web service listens on a local port, which uses gunicorn to accept HTTP requests and distribute to 8 worker threads.
  • The Flask web service connects to a local MySQL database.
  • To send registration confirmation and password reset emails, the Flask web service connects to a local Postfix server, which then forwards emails to USTC mail server. The Postfix server is also configured to receive service (at) icourse.club emails, and forward them to the personal emails of administrators.

Migration

A few years later, regulations require that all sites hosted on USTC servers must have an ICP license, and .club domains cannot obtain it. So, iCourse.club was migrated to a personal cloud server in DigitalOcean Singapore datacenter. The migration includes dumping the MySQL database using mysqldump, and copying the user uploaded files using rsync. The source code is maintained on GitHub.

We want to minimize site maintenance time when the database and user uploaded files are migrated. For large web services, we can use shared storage for user uploaded files and use replicated databases to make the migration with zero down-time. However, we are only a small website where down-time within a minute is acceptible. We wrote a bash script to do the following steps:

  1. rsync user uploaded files to the new server.
  2. Temporaily shut down the Flask web service, and the Nginx displays a “server under maintenance” static page.
  3. Dump the database using mysqldump.
  4. rsync mysqldump file to the new server.
  5. rsync user uploaded files to the new site (this should be very fast, just in case some user uploads a file during step 1).
  6. Import the mysqldump file to the database in the new server.
  7. Start the Flask application (gunicorn) on the new server.
  8. The Nginx on the old server forwards all HTTP requests to the new server. From this point, the web service is recovered.

Because updates of the DNS records take time, if we do not take any measure, there will be some time where some users are accessing the old site while the others are accessing the new one. To mitigate this issue, we first set the Nginx on the old server to forward all HTTP requests to the new server (as the last step of the above), then configure the DNS record on DNSPod (now we use CloudFlare instead). After some time we are sure that all DNS caches are expired (e.g., 1 day), we can remove the forwarding service on the old server.

Adding Backup Servers

The most precious asset of iCourse.club is not the source code, but the main database that contains 17,000+ reviews, 14,000+ courses, and 6,800+ users. When dumped into SQL text format, it is approximately 200 MB. Apart from the database, the users have uploaded 7,400+ avatars and images to the website (8.9 GB), as well as 1,800+ file sharings (1.8 GB). Although this 10 GB of data is quite small, it is even more precious than my personal data.

Initially, we utilized Git to track updates to both the database and uploaded files. Everyday in the midnight:

  1. automysqlbackup generates a database backup file (SQL text).
  2. The uploaded files are copied into a backup folder.
  3. Use Git to add the database backup file and changed (added/updated/deleted) files in the backup folder, and generate a commit.
  4. Push the Git repository to a GitLab server.

In effect, this Git repository takes a snapshot of iCourse.club every day, so we can go back to any day in the history of iCourse.club (since the day backup has started).

After a few years, the Git repository becomes too large (50+ GB) and the GitLab server often takes a significant amount of time and memory compressing and unpacking the commits. So, we stopped using Git push to synchronize backups, and use rsync in crontab to synchronize the Git repository to another two servers, one in the same datacenter and the other from another datacenter of another cloud provider.

Recently, we think the Git repository is too large, and synchronizing it often consumes significant network bandwidth because Git often garbage collects the objects. The rsync bandwidth should be small because the changes every day is incremental. However, when Git performs garbage collection, most objects are moved, and the rsync bandwidth is close to the total size of the repository (the entire history). Because cloud providers have metered outgoing network traffic, it becomes an increasingly significant problem.

To resolve this problem, the latest backup architecture shown above utilizes Restic to generate incremental backups, and send to XXX (the figure needs to be updated). We converted the Git commits to Restic history, so the entire history is now in the cloud, and the servers no longer store the entire history.

In the meantime, we deploy a hot standby server within the same datacenter, which uses the traditional rsync to synchronize the MySQL database dumps and user uploaded files. The rsync goes through VPC (Virtual Private Cloud) network, so the traffic is not metered. The Flask web service is deployed on the hot standby server. If the main server meets some irrecoverable failure, we can immediately switch to the hot standby server with data loss since midnight (the synchronization time).

Adding Web Gateway

After migration from USTC servers to cloud services, over the years, we have used DigitalOcean and Conoha for the main servers. However, the network connectivity from China mainland to their datacenters in Singapore and Japan are not stable. Some ISPs in China have poor bandwidth when connecting to the servers, and some ISPs even cannot connect to the servers. So, we used Cloudflare as a reverse proxy to accelerate iCourse.club, as shown below.

We used Cloudflare DNS servers and the DNS is resolved to Cloudflare proxy servers, so, HTTP requests from browsers go to the Cloudflare reverse proxy, which then are forwarded to our main Web server.

It is worth noting that many web services use Cloudflare to protect the main Web server, including mitigating DDoS attacks, hiding the IP address of main Web server to reduce the attack surface, etc. Cloudflare can also cache static resources (CSS, JavaScript, etc.) to reduce network traffic of backend Web servers. Our statistics show that Cloudflare reduced 30% to 40% of network traffic of the backend Web server by caching static resources.

At the end of last year, we switched from Cloudflare to our own Web gateways. The main reason is that the reachability of Cloudflare servers in mainland China became worse at the time, and the mean page load time is as high as 3 seconds. So, we used two cloud servers from Hong Kong datacenter of Huawei Cloud, which is close to mainland China and still do not need an ISP license.

Adding Test Site

During the initial phase of development, each of the developers runs an instance on the laptop. After the icourse.club website started running, we directly update code on the icourse server, but still use git to track code updates. We utilize the debugging mode of Flask to test and debug the code, and then restart the gunicorn service when the new code is ready to be deployed.

However, this development mode is insecure since the debug code directly updates the main database. In 2022, we added a test site that connects to a backup copy of the main database, so incorrect updates to the database will not affect the main one. This also enables new features to be reviewed before deployment on the main site. In addition, developers no longer update the code directly. Each of us has a clone of the code repo in the home directory, so we can work concurrently on different repos.

Currently, there is still a problem in restarting the gunicorn service. Because there is a time interval between killing the original gunicorn service and the new gunicorn service is initialized to accept incoming requests, some requests may experience 502 Bad Gateway. The best way to do this is to start a new gunicorn service that listens on a new port, wait for the new service to accept requests normally, then update the Nginx config to route requests to the new service, and finally stop the old service. In this way, the service can be upgraded smoothly and no requests will be dropped.

Designing Database Schema

The Entity-Relationship (E-R) graph of iCourse.

As suggested by Flask models, we use relational database to store all data in iCourse (except user uploaded files and images). Although PostgreSQL is technically more superior than MySQL (now MariaDB), we are still using MySQL (MariaDB) because I’m more familiar with it.

The three key pillars of iCourse database are courses, reviews, and users. Centered around the three tables, there are auxiliary tables to carry additional information.

Course

Currently, we define a course as a unique combination of course name and a list of teachers. So, it may seem natural to only have two tables: one for courses and the other as a many-to-many relation between courses and teachers (course\_teachers). However, we need to consider different semesters of a same course. So, we have one additional table (CourseTerm) to store the semesters of each course. There is an one-to-many relationship between Course and CourseTerm.

In fact, we have several redundant tables in iCourse, because initially we try to model the exact course model in USTC official course management system. A course has one or more semeters (CourseTerm), one semester has one or more classes (CourseClass). Each CourseTerm has its unique description, including course major, type, level, credit, hours per week, course number, etc. Different semesters of a same course may have different descriptions. Currently, iCourse.club only shows the description of the latest semester. In addition to the basic course information, we also store the time and location (classroom) of each course class (CourseTimeLocation), but it is never used.

Users can upvote and downvote courses (upvote and downvote are mutally exclusive). Users can also follow courses to subscribe to new reviews in a course.

Review

Reviews are the most important for iCourse.club. A review includes a text content, a numeric (1-10) rate, and several selections (difficulty, homework, grading, and gain). A user can submit at most one review for a course (i.e., a user cannot submit multiple reviews for different semesters). To find all reviews of a course easier, we store the course ID rather than semester ID in a review. A review also stores its publication time and last update time. A review also has privacy settings, including anonymous, only visible to student users, hidden, and blocked.

We can easily find that the ranking of courses should not be simply according to the average rate of courses. For example, a course with one review of rating 10 is probably not as good as a course with 10 reviews of average rating 9.9. If every user who has taken a course has written a review, then the average rating is accurate. However, in fact we have far less reviews, so, we need to estimate the average rating. We currently “normalize” the average rating by adding a number of reviews of a certain rating. The number of added reviews is the average number of reviews, and the rating is the average rating of all reviews in iCourse.club.

Because we frequently need to sort courses by the average rate and the number of reviews, we recompute the average rate and the number of reviews when a review is created or updated, and store them into the database.

Users can comment reviews, and each review can have multiple comments, so it is an one-to-many relationship.

Users can upvote reviews, so there is a many-to-many relation table between users and reviews. It is obvious that each user can upvote reviews at most once.

Recently, we store the update and deletion history of reviews and comments in a separate table, in case users and administrators delete reviews accidentally.

User

A user is a registered identity in iCourse.club. Currently, we only allow users with a valid USTC email to post reviews and comments. A user may be a teacher or student according to the suffix of the registered email.

A user may follow users and courses. Following a user means to receive notifications on new reviews, updated reviews, and new comments posted by the user. Following a course means to receive notifications on new and updated reviews in the course. The following user and following courses are many-to-many relationships between the user schema and itself, and between the user schema and the course schema, correspondingly.

Users may also receive notifications on other events. When a user mentions another user in the review or comment using “@username”, the mentioned user will receive a notification. A user will also receive notifications when other users follow him/her. When a review is blocked or unblocked by the administrator, the author will be notified.

Before 2019, the USTC course selection system publishes the list of students of each course. It is a many-to-many relationship between student numbers and CourseClass. iCourse.club enables users to bind the student number and get a list of courses. This binding is a one-to-one relationship between users and student numbers. Because iCourse.club does not synchronize the course selection information in real time, the users may want to select and deselect courses on their own. So, we added a button on the course page to enable users to choose their learned course. To simplify the user interface, when a user tries to “join” a course, we assume that the user is joining the latest semester of the course.

Since 2019, we can no longer obtain the student list of each course, so the list of learned courses is no longer maintained. We expect to allow users to input the credentials of the USTC course selection system, and automatically extract the learned courses.

Third-Party Signin

Since 2022, we introduced third-party sign in feature to sign in other websites (for example, pi-review.com) using iCourse.club credentials. We have several design principles:

E-mail System

We build a simple E-mail system for three purposes:

  1. Send notification and verfication emails to users’ email addresses.
  2. Receive inbound emails to service@icourse.club, and forward to the administrators.
  3. Enable administrators to send emails as service@icourse.club.

We use postfix mail server. It allows outbounnd emails from localhost (the web server) and authenticated administrators accounts from external network. It also allows inbound emails from the external network to service@icourse.club, and forwards such emails to the administrators.

We need to configure DNS MX records, SPF records and PTR records to make sure USTC email server will not consider our emails as junk emails.

How Many Visitors?

We can see live statistics of iCourse on https://icourse.club/stats/, which shows the number of new reviews, the number of new users, the average rating of courses and reviews, etc. However, we did not track the number of visitors, user logins, and page views.

Before 2022, we even did not keep Nginx access log for more than 30 days, so the historical data are permanently lost. Since 2022, we keep Nginx access logs permanently (hopefully!), and here are some statistics.

From live statistics, we can clearly see that most of the reviews are published during the winter and summer vacations. The total website traffic also see a surge during this period, where the total outgoing traffic is ~500 GB per month. We have ~120K page views per day and ~10K unique visitors (by IP) per day. During non-vacation periods, the total outgoing traffic is much smaller, ~300 GB per month, but the number of page views are not reduced so much: ~80K page views per day and ~8K unique visitors (by IP) per day.

When we first obtained this data, we thought it was really large for a website in USTC. Because USTC only have less than 8K undergraduate students and roughly 16K graduate students, and only first-year graduate students take courses, so the possible user base of iCourse.club is smaller than 12K. If we assume each user access icourse.club using one device, 10K unique visitors means nearly every student accesses the website once per day! It sounds ridiculous, so there must be some errors in the estimation. For example, maybe many users of iCourse.club are not students?

We can also take a look at the review rate of iCourse.club. In the last year, 10K reviews are published. If we consider 12K students in USTC, and each student take 20 courses per year, then there may be 240K reviews. So, roughly 4% of students write a review after taking a course. We have confirmed that more than 90% of reviews are for the last semester, so, although more than half of the reviews in this website are published in the last year, most students do not write reviews for an ancient semester. Surely, this distribution is not uniform. Undergraduate students are more likely to write a review, and graduate students seldom write reviews. Common-knowledge courses are more likely to attract reviews than courses specific to a major. Extremely good and bad courses typically have more reviews than mediocre courses.

In the future, we may record more information for statistics, e.g., access count of each course and user login history. Currently we even do not know how many registered users have logged in for the last day or month, because we did not record it! If we consider relational database too heavy to store such records, we may use a document-based database such as MongoDB or a key-value store such as Redis.

Architecture for the Future

In the future, iCourse.club may extend to multiple universities.

A multi-university course rating website would be much more useful than a simple aggregation of isolated course rating websites for each university. For example, we can compare the courses and teachers in different universites. A user may also comment on courses of other universities.

To be continued…

Comments