Reliable and Efficient Long-Term Social Media Monitoring
Abstract
Social media data is now widely used by many academic researchers. However, long-term social media data collection projects, which most typically involve collecting data from public-use APIs, often encounter issues when relying on local-area network servers (LANs) to collect high-volume streaming social media data over long periods of time. In this technical report, we present a cloud-based data collection, pre-processing, and archiving infrastructure, and argue that this system mitigates or resolves the problems most typically encountered when running social media data collection projects on LANs at minimal cloud-computing costs. We show how this approach works in different cloud computing architectures, and how to adapt the method to collect streaming data from other social media platforms.
Additional Information
We thank the John Randolph Haynes and Dora Haynes Foundation for supporting some of this research. We received support for our use of Google Cloud Platform through Google's COVID-19 research program. We also thank Anima Anandkumar and Anqi Liu for their work with us on related projects.Attached Files
Submitted - 2005.02442.pdf
Files
Name | Size | Download all |
---|---|---|
md5:fdb560cce77104287b0452693418423d
|
178.3 kB | Preview Download |
Additional details
- Alternative title
- Reliable and Efficient Long-Term Twitter Monitoring
- Eprint ID
- 109048
- Resolver ID
- CaltechAUTHORS:20210510-141337458
- John Randolph Haynes and Dora Haynes Foundation
- Google Cloud Platform
- Created
-
2021-05-10Created from EPrint's datestamp field
- Updated
-
2023-06-02Created from EPrint's last_modified field