„Research for a life without cancer“ is our mission at the German Cancer Research Center. We investigate how cancer develops, identify cancer risk factors and look for new cancer prevention strategies. We develop new methods with which tumors can be diagnosed more precisely and cancer patients can be treated more successfully. Every contribution counts - whether in research, administration or infrastructure. This is what makes our daily work so meaningful and exciting. The Division of Digital Prevention, Diagnostics and Therapy Guidance is seeking for the next possible date a Data Scientist for Cancer Research (PhD Position) in the field of Large Language Models Reference number: 2024-0375 The German Cancer Research Center (DKFZ) is a leading international biomedical research institution. We are committed to harnessing the power of AI and data science to transform oncology. Join us in pushing the boundaries of cancer research and innovation. The main goal of the division is the development of robust and interpretable digital tools to improve prevention, non-invasive early detection, diagnostic, and therapeutic approaches. A 20-member, almost fully externally funded team from the fields of medicine, molecular biology and informatics/data science focuses on identifying relevant patterns in patient data and increasing the explainability and robustness of deep learning-based classifications. We see software systems as part of clinical teams for more efficient patient care, and at the same time as a tool for effective prevention and early detection. In the past (since 2020), we have achieved much-noticed scientific success in these areas; our more than 80 internationally peer-reviewed research papers have been cited more than 5,000 times and numerous project results have been picked up by international media. Software products or apps from our working group have been downloaded more than a million times. Seven recently approved grants include the MiRisk consortium (1) which develops a free app to individually determine and minimize the risk for breast cancer. Within the BAP-1-consortium (2), we share & extend our expertise in building histology image pipelines to stratify patients for drug development. The Hector grant (3) enables us to integrate spatial transcriptomics for deep-learning-based heterogeneity scores to predict melanoma metastasis. The sKIn project (4) takes the remaining technical and formal steps to build our dermatologist-like skin cancer AI into dermatoscopes together with a company, bringing them into the hands of caregivers. MELCAYA (5) identifies new risk factors for melanoma in CAYAs. A deep learning strategy for high-throughput proteomics (6) allows higher resolution and faster processing of liquid biopsies. A signed collaboration with industry will lead to more individualized sunscreen recommendations based on epigenetic tests read from smartphone photographs via AI. Improved digital analysis of sarcomas (7), the interaction of language models and care, explainable AI algorithms for cancer screening and the optimization of the Sunface & Smokerface App also depict future plans of the group. We are seeking a passionate and talented data scientist to join our interdisciplinary team at the forefront of cancer research and artificial intelligence. This fully funded PhD position focuses on the development and application of Large Language Models (LLMs) for clinical research, with a specific emphasis on addressing complex challenges in oncology. In this position, you will have the opportunity to: Develop advanced Large Language Models (LLMs): Tailor and fine-tune LLMs to address clinical questions in oncology, enabling innovative solutions in diagnosis, treatment planning, and patient care Investigate emergent properties: Conduct cutting-edge research into the emergent behaviors of LLMs, understanding their capabilities and limitations in clinical applications Collaborate on transformative projects: Contribute to projects like UroBot, a groundbreaking tool that demonstrates the potential of LLMs to surpass human performance in specialized medical domains Drive clinical innovation: Work closely with clinicians and domain experts to translate AI innovations into tangible advancements in oncology Key Responsibilities: Conduct research on LLM architectures and adaptation techniques for clinical data Develop novel methodologies to improve the interpretability, reliability, and accuracy of LLMs in clinical settings Collaborate with a multidisciplinary team of data scientists, oncologists, and software engineers Publish findings in leading academic journals and present at international conferences Master’s degree in computer science, data science, bioinformatics, or a related field Strong programming skills in Python and familiarity with machine learning frameworks Interest or experience in clinical and biomedical applications is highly desirable Excellent problem-solving and communication skills Good knowledge of English; knowledge of German is not required Please submit your application, including a CV, cover letter, and academic transcripts, via the apply button. Excellent framework conditions: state-of-the-art equipment and opportunities for international networking at the highest level Access to international research networks Doctoral salary with the usual social benefits 30 days of vacation per year Flexible working hours Possibility of mobile work and part-time work Family-friendly working environment Sustainable travel to work: subsidized Germany job ticket Unleash your full potential: targeted training and mentoring through the DKFZ International PhD Program and DKFZ Career Service Our Corporate Health Management Program offers a holistic approach to your well-being