At this course, the three blogs and the group project are actually very related.
① My first blog talked about SMA in many aspects, such as definition, components, SMA procedure and SMA application examples. By elaborating the blog, I actually have both reviewed the course content and searched more information to form lasting memory.
② The second blog is more about humanity. My second blog presented SDG goal 3, ensuring healthy life, form which I learned more knowledge about public health.
③ In my third blog, the title is ‘How SMA improves human well-being’, and I showed how SMA could be applied on disease surveillance. I think this blog can act as a bridge to connect the previous two.
④ And the group project requires us to start from a scientific research, understand it, and propose our own method to apply SMA to address the issue. We talked about social media security. The proposed new methods, instead of basing on words/ context analysis, we suggest using graph-based method which can utilize graph structure information to detect dangerous event. The whole process actually inspires us to handle existing problems, from which we students can lay foundation on scientific research.
From this course, we learned many SMA techniques. And most importantly, we applied them into practical use.
2.1 Sentiment analysis
Sentiment analysis analyzes people’s opinion, sentiments, evaluations, appraisals, attitudes, and emotions towards entities and their attributes . In the assignment 1, we use the dictionary-based approach to analyze the comments from others classmates. The code of assignment 1 is posted in: https://betterzhou126.home.blog/2020/04/16/the-code-of-assignment-1/
- Pre-processing: just remove stop words, emoticons, and do character normalization.
- Tokenization: breaking the raw text into words.
- Stemming: convert different forms of a word into one form.
- Check opinion words.
The final score of the dictionary-based approach is normalized between -1 to 1. My sentimental score is about 0.0178. Generally, it is a good result, which means other classmates are satisfied with my blog. However, I find that I would have better score, after doing the case study.
One classmate leaves the comment as:
‘I feel so sad when I see so many people are suffering in Wuhan…terrible tragedy. Anyway, I always believe everything will get better soon.’
But the limitation of dictionary-based approach is that it only considers whether positive or negative words appear in context, it cannot consider the overall sentimental feeling. In above comment, although it contains negative words like ‘sad’, ‘suffering’, ‘terrible tragedy’, the positive words, ‘get better’, are more crucial. Therefore, I suggest people should first learn the advantage and disadvantage of a SMA method, which might enable them to better understand the obtained the result.
2.2 Social network analysis
Social network is actually a special graph, where users are nodes and their relations are the edges between nodes. People can utilize the graph structure information to analyze the social network. At this course, it is very interesting that all the classmates actually form a special directed graph on the wordpress.com. The code of assignment 2 is also posted on my blog:
- Degree Centrality
The degree of a node can reflect its importance in the social network. In directed graph, the edge has direction and thus degree can be divided into in-degree and out-degree. My in-degree is 9 and out-degree is 13. It means that I am actually an outgoing person and prone to appreciate other’s work
- Closeness Centrality
Closeness Centralityof a node is the inverse of the summation of shortest distance from this node to all other nodes . It measures how close a node is related to other nodes. My closeness centrality is 0.4154. It shows that I am relatively close to other classmates. In other words, I am not a central node in the social network. During programming, I also write the code to calculate the closeness centrality from scratch. And I find that there are only several nodes that I cannot reach, which means there is no path between us. According to the Six Degree theory, I perhaps have connected some important nodes in this network (because I can get the shortest-path between the most of nodes and me). So, we can imagine that, perhaps, I am somewhat in the ‘corner’ of the network (not the central part), Although I can have access to most of nodes, the path is relative long. Compared with the true central nodes, I might need to pass more intermedia to reach specific nodes.
- Betweenness Centrality
Betweennessof a node represents the proportion of shortest paths between all other actor pairs that the actor in concern resides on . In other words, it measures the degree that the node is a connecting node between communities. My shortest-path betweenness is 0.0474. To some extent, I might act as a connecting node between CS and IE (students in IE have more friends in IE, and vice versa). I have several friends in the IE department and we have appreciated each other’s work, while the majority of the comments are left by students in CS department. However, the Betweennessvalue is not so high, and it reflects that I am probably not an important connecting node.
 Lecture Five: Sentiment Analysis I.
 Lecture Seven: Social Network Analysis I.