As social media continues to become an incredible mode of communication in daily life dealing with the exchange of information, these systems provide authors a platform where they can share their thoughts, feelings, and experiences about a number of topics. Harnessing the information expressed publicly through these modes can be incredibly powerful: public perceptions, signals, and data about a variety of specific topics could be extracted and studied from these posts. However, there is a common trade-off in collecting information about a topic from social media: the more specific the topic, generally, the more challenging it is to extract meaningful information. This is because, at first glance, social media posts are simply too noisy: authors post topics that are forced to inject meaning in a short length (140 characters on Twitter). This work presents a nontrivial methodology to overcome this problem. It uses state-of-the-art programming and data storage technologies, stop-word dictionaries, author filters and Twitter bot detectors. Short of evaluating the authenticity of the collected tweets, which will be done in future work through Amazon Mechanical Turk evaluators, we demonstrate how our methodology extracts specific, meaningful tweets about topics related to chronic diseases and medication.
Available at: http://works.bepress.com/derek_doran/45/
Presented at the Seventh Annual Celebration of Research, Scholarship, and Creative Activities, Dayton, OH, April 15, 2016.