Plagiarism and Mindset in Writing Style

Team: 19

School: La Cueva High

Area of Science: Behavioral Science

Interim: What is the project about?:
Our project seeks to use natural language processing to analyze groups of texts for correlation in tone. In many cases it can be easy to overlook certain tonal (in the sense of connotation and not the sense of music tonality) similarities between writing samples when the amount of writing to look over is very large and the topic is very general. We want to see if we can quantify essays in respect to answering a particular broad topic. Not only to detect similarities but to get an understanding of people’s thinking and opinions on the topic of discussion in the sample text.

How are you/do you plan to solve this problem computationally?:
We plan to solve this problem using various applications of the Python programming language and natural language processing libraries such as mainly word2vec and related libraries, but also including document based libraries such as doc2vec. When analysing large amounts of sample text, basic AI algorithms might be useful in solving the problem. We will get samples of writing related to controversial or emotionally political prompts and we hope to get their honest opinion on what students in our high school’s classes think. Then using a combination of algorithms we will devise a graph of the class’s responses and make judgements over the similarity of the texts they wrote.

What progress have you made up to this point?:
Majority of the work was research since we are both very new to the world of Neural Networks. Though we didn’t need to get into all the technical aspects of the Neural Networks we still had to learn about the specifics of natural language processing and the theory that goes into it. We learned how to use the Gensim word2vec library and the basic functionality behind it. Specifically we had it look over the King James Bible and Reddit News Titles and analyzed the occurrences of particular words and their relationships with other words. This was used as a basic exercise to become familiar with the library and the algorithm. We are also improving our Python skills since it is a change for us in terms of Programming(We are used to C like languages).

What results are you expecting?:
The finished result of the project should be able to receive samples of writing related to a prompt, analyze those samples (using code), then output the analysis of the data in the format of texts and graphs. The resulting output will classify the responses in categories based on tone/diction of the text. Extremely similar texts will be flagged for possible plagiarism or copying of ideas. These results should aid teachers and those tasked with analyzing responses in determining public opinion of their ideas as well as finding students/individuals who were surveyed that may have copied others or submitted the same response twice. Though the more we look into it the more we want to just create a system of numerically evaluating essays if we cannot detect plagiarism easily enough.

Team Members: Edward Potapov, Joel Gates
Mentor: Dennis Trujillo

Real Python. (2020, November 26). Use Sentiment Analysis With Python to Classify Movie Reviews. Retrieved December 10, 2020, from

Grayson, Siobhán, et al. “Novel2Vec: Characterising 19th Century Fiction via Word Embeddings.”, University College Dublin,

Shperber, G. (2019, November 05). A gentle introduction to Doc2Vec. Retrieved December 10, 2020, from

Garbade, D. (2018, October 15). A Simple Introduction to Natural Language Processing. Retrieved December 10, 2020, from

What is Natural Language Processing? Introduction to NLP. (2020, June 23). Retrieved December 10, 2020, from

Team Members:

  Joel Gates
  Edward Potapov

Sponsoring Teacher: Yolanda Lozano

Mail the entire Team