Research data and software publications have become a regular output of scientific work. Yet unlike more traditional text publications, widely established processes to assess and evaluate their quality are still missing. This fact prevents researches from getting the proper credit they deserve as common performance indicators often just omit this part of scientific contributions.
As part of...
There is a gap between current responsible research and innovation (RRI) as well as open sciences (OS) practices and assessment practices. While
research practices and their ways of publication and dissemination have diversified, assessment practices have remained narrow – focusing on criteria of publication quantity and reputation. In my talk, I will discuss two projects. The first project...
The use of commonly agreed terminologies is an elementary component of database systems. They have an impact on data consistency, querying and retrieval or interoperability. Creating, searching for and agreeing on a terminology to be used is a non-trivial problem, as it requires specialised knowledge and coordination processes. This presentation introduces the terminology service that deals...
Within the MeDaX project we study bioMedical Data eXploration using graph technologies. We design and implement efficient concepts and tools for integration, enrichment, scoring, retrieval, and analysis of biomedical data. Interested in data similarity and quality measures, we initiated an international community project for biomedical provenance standardisation and cooperate within the...
Changes occur frequently, especially in data-driven long-term studies. Changing databases lead to the accumulation of many schemes and instances over time. However, any scientific application must be able to reconstruct the historical data to ensure the reproducibility or at least the explainability of the research results. A method is needed that allows each database version to be easily...
Galaxy is an open-source platform that allows researchers to analyze and share scientific data using interoperable APIs and various user-friendly web-based interfaces. The Galaxy project was launched in 2005 and has since become a powerful tool for researchers across a wide range of research fields, including *omics, biodiversity, machine learning, cheminformatics, NLP, material science,...
Does research data management as we know it in the context
of database research or data science need platforms like Hugging Face?
Or are platforms and services such as Kaggle or GESIS sufficient? In
this talk, after giving a brief overview of the core features of
Hugging Face, we claim that the data research community would benefit
a lot from a platform similar to Hugging Face, in...
Im Vortrag wird Snowflake kurz vorgestellt und Herausforderungen im Bereich Datenbanken aufgezeigt, an denen wir derzeit arbeiten. Auch kurz das Snowflake Academia Programm wird vorgestellt.
The current biodiversity crisis has triggered an extreme need for a better understanding of the network of life on Earth. Efficient data management is crucial in biodiversity and is the backbone for a digital twin of past, present, and future life. The Research Data Commons (RDC) is the central cloud-based information system architecture of NFDI4Biodiversity, the consortia of the NFDI...
The problem of generating synthetic data is almost as old as modern research itself. However, with the advent of generative AI, new possibilities for synthesizing tabular data have emerged that go far beyond the capabilities of traditional statistical or rule-based approaches. Most of this new research comes from the ML community, where ML models need to be fed with useful training data. Since...
Reproducible research emphasizes the importance of documenting and publishing scientific results in a manner that enables others to verify and extend them. In this talk, we explore computational reproducibility within the context of Jupyter notebooks, presenting insights and challenges from our study. We will present the key steps of the pipeline we used for assessing the reproducibility of...
TIRA is a platform to organize shared tasks with software
submissions, mostly in information retrieval and natural language
processing. Due to the software submissions, TIRA allows blinded
experimentation on (confidential) datasets to which participants have no
access. After a shared task, the artifacts of the shared tasks, i.e.,
research data in the form of submitted software, inputs,...