More selected projects

Corobot: When the virus begin to talk

Corobot is a virtual virus chatbot trained on COVID-19 related tweets and social media posts.

This project aims to provide a way to communicate with the illusioinal, unfamiliar creature and reflect on our own. Can we build a persuasive, believable persona through people’s comments and thoughts towards it? Even when this "person" is not exist? And, could we regard machine learning as modern witchcraft?


produced by: Uchan Sun

Introduction

The 2019 coronavirus disease (COVID-19) epidemic is a public health emergency of international concern and poses a challenge to psychological resilience. Research are needed to analyze and reduce adverse psychological impacts during the epidemic. CoroBot is a chatbot based on social network data about COVID-19 through GPT-2 training, and allow users to directly communicate with it. The aim of this study is to provide a new perspective for social awareness towards the anthropomorphization of the non-human life, and seek the potential solution for psychological instruction during the pandemic. The importance of this research topic can also be observed in areas such as  artificial life and affective computing.     

Background and concept

Traditionally, the strong version of Artificial Intelligence is based on the assumption that cognitive functions are computational and thus independent of the specific material substrate, and in the same way, artificial life research seems to presume that one can separate the logical form of an organism from its material base.[1] The necessity of material base is controversial in previous research, so in this research, through the perspective of affective computing, to turn the biological harmful virus into the data-based computational life. CoroBot focused on communication rather than material substrate of actual virus, so through the text-based conversations, we can observe users’ emotional concern about the current pandemic situation, and seek possible countermeasure.

Interaction and technical design 

The basic dataset is a twitter dataset of 40+ million tweets and Reddit posts related to COVID-19[2] The advantages of using social media data is it including both news report, scientific research breakthrough, and people’s life struggle around the COVID-19, so the training result will be a syncretic model including both scientific characteristics of the virus and the reflection of social issues.  

In the current demo, we used around 13MB txt file to train the basic model in GTP-2[6] in 2000 steps through Runway.ML. It took around 1 hour to finish the training process. This demo model will continuously generate new sentence based on the input sentence, then enable continuous conversations between the user and the CoroBot . Since the original data are mostly written in the third person in reports and the first person in common tweets, we use Rita.js[7] to fix the perspective of the sentence. Like the following example:

 

The web-end user interface was made in p5.js[8]. The training model in Runway.ML[9] is hosted, and reached the online website at github, and send message record to Firebase. For further development, we aim to clean the original training data to the first person text from Coronavirus’ view. So the chatbot will talk more fluently.

Development and result

Here is the conversation record of early demo:

Human said: [Do you have boyfriend? ]

Coronavirus said: [I’m waiting for you." "#CoronavirusPandemic #Pakistan #FridayTh]

Human said: [Do you hate human? ]

Coronavirus said: [I hate the world?" "No, hate the present." "RT @kimhunseason:]

Human said: [Where did you born? ]

Coronavirus said: [" Cruz asked. "My parents immigrated here. They brought me here]

Human said: [Who are your parents? ]

Coronavirus said: [What type of care are they getting? #CoronavirusPandemic #Coron]

Human said: [Do you hate us? Why did you do this to us? ]

Coronavirus said: [Fuck you, you pathetic moron. I have been president for over ]

In an emotional way, CoroBot can reflect our perception of the concept of COVID-19 as a mirror. Everything we feel through the conversation with this virtual virus is based on the former feedback we sent on the internet. The entire human society have struggled with the origin and the solution for the COVID-19 virus, and it’s a natural and biological issue. The existence of the original virus is materialistic, but the impact of it is not. As a Chinese student, I saw too many xenophobia problems break out during the pandemic.[13] To me, those pathetic and offensive attacks remind me of an old saying: Pandemic would never cause human extinction, but human themselves will. I hope that through the unbiased personification of the Coronavirus, it can invoke our reflection towards the internal strife in our society. 

You can still see redundant words in the response, so I use Twitter Processor to clean the database. After development, the conversation improved a lot, and I applied UI to it:

 

CoroBot was trained on human society’s message dataset, so these rage words are actually the reflection of ourselves. The original of this specific rage is undefinable, but it does contain multiple interpretations. Will the coronavirus really hate us? Or did it feel indifferent towards the alien race?  This is an opportunity to actual regard ourselves as the community with a shared future for mankind.    

Further development

For now, the CoroBot is still under development. In the training part, I need a larger database and more precise work of data cleansing. As future work, I intend to arrange user tests and analyse the feedback. Enabled by machine learning, this research will reflect people’s attitudes from multiple ways: through the characteristic of the anthropomorphized Coronavirus, the conversations between the CoroBot and users, and user’s emotional status change before and after the chat. This will allow us to analyse the pandemic both in a data-analytical and psychological way. Another idea is to train and improve CoroBot through the new conversations between it and the users. Rather than improve its ability, this way is aim to see what can it learn from human conversation, since the new data are knowingly produced through communication rather than spontaneous daily tweets.

On the other side, I’m eager to build the actual Chinese version of CoroBot. Due to the ‘Great Wall of the internet’ in China, the online platform and discussion are so different from my own country and twitter. Most discussions happened on Weibo, which is like a Chinese version of Twitter but with more restrictions. I’m pretty sure the personality will be different from the twitter version. Chinese CoroBot will have experienced more direct and unbearable tragedies happened on our planet where the pandemic began. To sum up, I hope that this project can gather ourselves together for a better future.

Research and code

Source code: https://github.com/sunyuqian1997/CoroBot

See poster Corobot: When the Coronavirus Begin to Talk – AI Chatbot based on tweets about COVID-19 , presented at ACM 2020 womENcourage conference

 

References

[1]    F. J. Varela and P. Bourgine, Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life. MIT Press, 1992.

[2]    Thomas S. Ray, ‘Tierra’. http://web.stanford.edu/class/sts129/Alife/html/Tierra.htm (accessed Apr. 27, 2020).

[3]    ‘Artificial Life: The Coming Evolution’, INET Oxford. https://www.inet.ox.ac.uk/publications/artificial-life-the-coming-evolution/ (accessed Apr. 27, 2020).

[4]    J. M. Banda and R. Tekumalla, ‘A Twitter Dataset of 40+ million tweets related to COVID-19’. Zenodo, Mar. 23, 2020, doi: 10.5281/zenodo.3723940.

[5]    DocNow/twarc. Documenting the Now, 2020.

[6]    openai/gpt-2. OpenAI, 2020.

[7]    D. Howe, dhowe/RiTaJS. 2020.

[8]    ‘home | p5.js’. https://p5js.org/ (accessed May 04, 2020).

[9]    ‘RunwayML | Machine learning for creators.’ https://runwayml.com/ (accessed May 04, 2020).

[10]    ‘Node.js’. https://nodejs.org/zh-cn/ (accessed May 04, 2020).

[11]    ‘Express - Node.js Web ’. https://expressjs.com/zh-cn/ (accessed May 04, 2020).

[12]    ‘Socket.IO’. https://socket.io/ (accessed May 04, 2020).

[13]    ‘Covid-19 has inflamed racism against Asian-Americans. Here’s how to fight back (opinion) - CNN’. https://edition.cnn.com/2020/04/10/opinions/how-to-fight-bias-against-asian-americans-covid-19-liu/index.html (accessed May 04, 2020).

Special Thanks: Yi Zhu, Raybo Lee.