A Semi-automatic Indexing Pipeline for Medical Document Retrieval in Resource-constrained Settings
- Creators
- Davison, Stephen
- Avgil, Dana
- Li, Yan
- Yang, Sonia
Abstract
Medical document indexing can benefit from both automation and human feedback. This research develops a semi-automatic indexing pipeline (SIP) for medical document retrieval in resource-constrained settings. The SIP includes an affordable and efficient automated process for preparing and indexing continuing medical education documents and a human feedback loop to validate recommended terms. It leverages pre-trained Named-entity Recognition models to identify appropriate terms from the MeSH vocabulary and higher-level subject terms from UMLS. The SIP achieved a precision of 59%, a recall of 64%, and an F1 score of 61% based on the expert evaluation of 124 distinct medical documents. The combination of automation with a human expert feedback loop demonstrates a model strategy for an affordable and practical approach to document indexing in resource-limited yet critical services. The SIP may be extended to other environments and information sources to improve the efficiency and accuracy of information retrieval.
Additional Information
© 2022, the Author(s). This material is brought to you by the Americas Conference on Information Systems (AMCIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in AMCIS 2022 Proceedings by an authorized administrator of AIS Electronic Library (AISeL).Additional details
- Eprint ID
- 116358
- Resolver ID
- CaltechAUTHORS:20220818-172653986
- Created
-
2022-08-19Created from EPrint's datestamp field
- Updated
-
2022-08-19Created from EPrint's last_modified field