In my first semester of postgraduate study, my friends and I completed a project on data visualisation. The project leader at the National Library of Scotland provided us with the original data: Scottish examination papers from 1888 to 1963. The dataset contains photographs of the papers and no clean-up OCR text containing 2,849,689 words. The data holder (National Library of Scotland) wanted us to tell the story behind the data in a more vivid and engaging way.
在研究生第一学期,我和朋友们完成了一个数据可视化的课题。苏格兰国家图书馆的项目负责人给我们提供了原数据:从1888年到1963年的苏格兰考试试卷。数据集包含试卷的照片,和总共 2,849,689 个词的未校验 OCR 文本。数据持有者(苏格兰国家图书馆)希望我们将这些数据背后的故事的以一个更生动的,更吸引人的方式讲述出来。
In the group, I was responsible for analysing the changes in the subjects of the exams over time. In the data analysing phase, I chose the exam time as the anchor point to locate the exam subject. To improve the extraction accuracy, I tried two methods: assigning values to keywords and setting thresholds to filter the time rows; using regular expressions for extraction. There were two main difficulties in using the first method (assigning values to keywords). Firstly, the unchecked text, for reasons such as misspelling of keywords, makes it difficult for some of the keywords to be assigned values. Secondly, the setting of thresholds is closely related to the accuracy rate. If there is enough time, the data can be annotated for training and machine learning methods can be used to find the optimal threshold to improve the accuracy of extracting subjects.
在小组分工中,我负责分析考试的科目随时间的变化。在数据处理阶段,我选择考试时间作为锚点来定位考试科目并将它提取出来。为提高提取准确率,我尝试了两种方法:给关键词赋值并设置阈值来筛选时间行,和使用正则表达式进行提取。在使用第一种方法(给关键词赋值)时,主要有两个难点。首先,未校验文本,因为关键词拼写错误等原因,导致部分关键词难被赋值。其次,阈值的设定与准确率紧密关联。如果有足够多的时间,可以标注数据进行训练,通过机器学习的方法找到最优的阈值从而提升提取科目的准确率。
The changes in the number of subjects examined from year to year is shown in the graph below. It is clear that as time progresses, more and more subjects are included in the examinations. Especially around 1950, the number of subjects increased significantly.
经过分析,每年考试科目数目的变化被显示在下方的图中。可以明显的看出随着时间的前进,越来越多的科目被包含在了考试当中。尤其是 1950 年前后,数目大幅度的增长。
(How subject quantity changes over years)
(考试科目数目如何随着时间变化)
In terms of presentation, we decided to present the data in a serious comic style, which would appeal to a wider audience, without losing the seriousness of the data. The entire comic will be shown on a web page, for which we have also designed a number of dynamic elements to increase the interactivity of the page.
在表现形式方面,我们决定将数据通过较严肃的漫画风格的形式展现出来,即能吸引更多的观众,又不失数据的严肃性。整个漫画将用网页呈现,为此,我们也设计了许多动效元素,增加网页的可交互性。
To further explore the changes in subjects, I have selected four representative years for further analysis. The 1888 student bag contained textbooks for 12 subjects, distributed in three different areas: languages, math and sciences.
为了进一步探讨科目的变迁,我选取了 4 个具有代表性的年份进行进一步分析。1888年,学生的书包中只包含12个科目,并且只覆盖了语言,数学,和科学领域。
After 1888, the number of subjects gradually increased. By 1921, a total of 16 courses had been included. More and more courses were added to the fields of languages, maths and sciences. Gaelic, for example, was a new course that was not in 1888.
1888年后,科目的数量缓慢增加。到1921年,已经增加到了16门课程。在语言学,数学,和科学领域均有所增加。例如,盖尔语就是一门在1888年还未开设的课程。
Around 1950, the number of courses increased dramatically. It has doubled in less than 30 years. In addition to the original three areas, other areas were added, like music and liberal arts. Meanwhile, the number of sciences-related subjects was also increased. This has been supplemented by zoology, chemistry, etc. The modern curriculum design is emerging!
1950年前后,课程的数量急剧增加。不到30年的时间里,课程数量翻了一倍。除了原有的三个领域外,还增加了如音乐相关课程,文科课程。同时,与科学有关的科目也增加了,包括增加了动物学、化学等。可以看出,1950年前后已经颇有现代学科的影子了。
Then, the course began to develop in new areas. Dress and Design, Home Management courses were also being added to the mix. Eventually, by 1963, there were already 38 subjects in the exam!
随着时间的推移,越来越多的课程加入到了新的领域。例如服装设计,家政等,也被加入其中。到1963年,考试中包含的科目已经达到了38个。
In addition to subject subject research, my teammates discussed topics such as sex bias. You can access our page through this link: Scottish School Exam Papers
除了学科科目的研究,我的队友们还讨论了诸如性别偏见等的话题。你可以通过这个连接访问到我们的网页:Scottish School Exam Papers