Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published May 5, 2020 | Submitted
Report Open

Reliable and Efficient Long-Term Social Media Monitoring

Abstract

Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which most typically involve collecting data from public-use APIs, often encounter issues when relying on local-area network servers (LANs) to collect high-volume streaming social media data over long periods of time. In this technical report, we present a cloud-based data collection, pre-processing, and archiving infrastructure, and argue that this system mitigates or resolves the problems most typically encountered when running social media data collection projects on LANs at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.

Additional Information

We thank the John Randolph Haynes and Dora Haynes Foundation for supporting some of this research. We received support for our use of Google Cloud Platform through Google's COVID-19 research program. We also thank Anima Anandkumar and Anqi Liu for their work with us on related projects.

Attached Files

Submitted - 2005.02442.pdf

Files

2005.02442.pdf
Files (178.3 kB)
Name Size Download all
md5:fdb560cce77104287b0452693418423d
178.3 kB Preview Download

Additional details

Created:
August 19, 2023
Modified:
October 23, 2023