Description
Natural Language Processing plays a crucial role in the automated parsing of clinical data. This data encompasses a wide range of sources including clinical notes, trial documents, imaging and sensor data, and patient-reported outcomes. In the project Avatar, we explore clinical trial documentation. These describe the circumstances of a trial: which patient groups shall be in- or excluded, and which kind of treatment is considered. From these documents, we extract relevant details related to the patient accrual process with the goal of providing researchers with data that fits to their study.
Due to the often-unstructured nature of descriptions, inferring users’ intent becomes a challenging task. To address this issue, we use Named Entity Recognition (NER) to extract and categorize information within clinical trial descriptions. Given the recent advances in pre-trained language models, we fine-tune a selection of such models on the Facebook NER (FBNER) dataset which consists of clinical trial descriptions. Our approach involves fine-tuning these language models using supervised token-level classification along with domain adaptation (from generic clinical as well as non-clinical models). We classify extracted information into categories such as "disease," "therapy," "drugs", "gender" etc. and evaluate the performance on German Clinical Trials.