Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/88435/dsp01rb68xf59g
Title: Automatic Punctuation of Lecture Transcripts & Student Usage Analysis of Video Lectures in Online Learning Platforms
Authors: Morkos, Ragy
Advisors: Gunawardena, Ananda
Department: Computer Science
Class Year: 2018
Abstract: Despite the prevalence of speech-to-text or automatic speech recognition (ASR) services, the discrepancy in the quality of the output transcripts can be large. Most notable in the vast majority of provided transcripts are their lack of punctuation or text segmentation. This can be commonly seen in automatically produced transcripts hosted on most online learning platforms, including YouTube. Despite Google (of which YouTube is a subsidiary) having shown to possess an engine for providing properly punctuated text in some of its other services, YouTube video transcripts nonetheless are still largely uncurated and are merely a stream of unpunctuated text. This can be especially problematic for YouTube video lectures and students who are using them. Students can depend on lecture transcripts for a variety of reasons, including hearing disabilities, preference for easier note-taking, using them as checkpoints in videos or for search features, etc. While some of the online course providers who host their lecture videos on YouTube provide them with manually punctuated and curated transcripts, it is an expensive and tedious process and there still remains a lot of lecture content that does not have punctuated lecture transcripts. This paper tries to explore an automatic solution for the stated problem of uncurated ASR transcripts. Through the training of a bidirectional recurrent neural network on manually punctuated lecture transcripts, we will show that it is possible to add punctuation to the transcripts with an overall F-score accuracy of 67%. We will be specifically implementing these new transcripts on Princeton Salon in COS 126, Princeton University’s introductory CS class. Rolling out these newly punctuated transcripts to our student subject body, we did not notice an increase in student satisfaction despite the large improvement in the transcript quality. We therefore decided to conduct quantitative surveys as well as qualitative user studies/interviews to better understand how students use transcripts and Salon in general in their studying. We conclude that context and organization of transcripts as well as quizzes and other features on an online learning platform matter much more than the quality of the transcripts based on our student responses. We furthermore propose a way to utilize student usage data on Salon to provide more organization and segmentation to transcripts as well as offer a preliminary analysis of crowd-sourced student usage data.
URI: http://arks.princeton.edu/ark:/88435/dsp01rb68xf59g
Type of Material: Princeton University Senior Theses
Language: en
Appears in Collections:Computer Science, 1988-2020

Files in This Item:
File Description SizeFormat 
MORKOS-RAGY-THESIS.pdf689.71 kBAdobe PDF    Request a copy


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.