Frühjahrstreffen der FG Datenbanken

Name: Frühjahrstreffen der FG Datenbanken
Start: 2024-03-11T13:00:00+01:00
End: 2024-03-12T13:00:00+01:00
Location: Universitätshauptgebäude

Mar 11 – 12, 2024

Universitätshauptgebäude

Europe/Berlin timezone

Contact

birgitta.koenig-ries@uni-jena.de

Contribution List

28. Welcome

Birgitta König-Ries (Heinz Nixdorf Chair for Distributed Information Systems)

3/11/24, 1:00 PM

Talks
9. On the Path to a Quality Indicator for Software and Data Publications for the Helmholtz

Marcel Meistring (Helmholtz Open Science Office)

3/11/24, 1:15 PM

Vortrag

Talk

Talks

Research data and software publications have become a regular output of scientific work. Yet unlike more traditional text publications, widely established processes to assess and evaluate their quality are still missing. This fact prevents researches from getting the proper credit they deserve as common performance indicators often just omit this part of scientific contributions.
As part of...
Go to contribution page
13. From theory to practice - Advancing Research Assessment for Incentives at Charité and BIH through infrastructure

Miriam Kip (BIH Charité)

3/11/24, 2:00 PM

Vortrag

Talk

Talks

There is a gap between current responsible research and innovation (RRI) as well as open sciences (OS) practices and assessment practices. While
research practices and their ways of publication and dissemination have diversified, assessment practices have remained narrow – focusing on criteria of publication quantity and reputation. In my talk, I will discuss two projects. The first project...
Go to contribution page
8. Terminologies in database systems

Felix Engel (TIB)

3/11/24, 3:00 PM

Vortrag

Talk

Talks

The use of commonly agreed terminologies is an elementary component of database systems. They have an impact on data consistency, querying and retrieval or interoperability. Creating, searching for and agreeing on a terminology to be used is a non-trivial problem, as it requires specialised knowledge and coordination processes. This presentation introduces the terminology service that deals...
Go to contribution page
7. Medax - a knowledge graph for biomedicine

Judith Wodke (U Greifswald)

3/11/24, 3:30 PM

Vortrag

Talk

Talks

Within the MeDaX project we study bioMedical Data eXploration using graph technologies. We design and implement efficient concepts and tools for integration, enrichment, scoring, retrieval, and analysis of biomedical data. Interested in data similarity and quality measures, we initiated an international community project for biomedical provenance standardisation and cooperate within the...
Go to contribution page
6. Schema Evolution in Research Data

Tanja Auge (U Regensburg)

3/11/24, 4:00 PM

Vortrag

Talk

Talks

Changes occur frequently, especially in data-driven long-term studies. Changing databases lead to the accumulation of many schemes and instances over time. However, any scientific application must be able to reconstruct the historical data to ensure the reproducibility or at least the explainability of the research results. A method is needed that allows each database version to be easily...
Go to contribution page
14. Democratising data analysis with Galaxy

Björn Grüning

3/11/24, 5:00 PM

Vortrag

Talk

Talks

Galaxy is an open-source platform that allows researchers to analyze and share scientific data using interoperable APIs and various user-friendly web-based interfaces. The Galaxy project was launched in 2005 and has since become a powerful tool for researchers across a wide range of research fields, including *omics, biodiversity, machine learning, cheminformatics, NLP, material science,...
Go to contribution page
5. From Research Data Management to Data Platforms: A Hugging Face Approach

Michael Gertz (U Heidelberg)

3/11/24, 5:45 PM

Vortrag

Talk

Talks

Does research data management as we know it in the context
of database research or data science need platforms like Hugging Face?
Or are platforms and services such as Kaggle or GESIS sufficient? In
this talk, after giving a brief overview of the core features of
Hugging Face, we claim that the data research community would benefit
a lot from a platform similar to Hugging Face, in...
Go to contribution page
23. Snowflake Berlin

Dirk Junghanns (Snowflake)

3/11/24, 6:15 PM

Vortrag

Talk

Talks

Im Vortrag wird Snowflake kurz vorgestellt und Herausforderungen im Bereich Datenbanken aufgezeigt, an denen wir derzeit arbeiten. Auch kurz das Snowflake Academia Programm wird vorgestellt.
Go to contribution page
16. Problems and Issues in Biodiversity Data Infrastructures

Bernhard Seeger (U Marburg)

3/12/24, 9:00 AM

Talk

Talks

The current biodiversity crisis has triggered an extreme need for a better understanding of the network of life on Earth. Efficient data management is crucial in biodiversity and is the backbone for a digital twin of past, present, and future life. The Research Data Commons (RDC) is the central cloud-based information system architecture of NFDI4Biodiversity, the consortia of the NFDI...
Go to contribution page
29. Flashtalks

3/12/24, 9:30 AM

Talks

1 Minute Teasers presenting the posters
Go to contribution page
10. Tabular Data Synthesis for Data Management

Fabian Panse (HPI)

3/12/24, 11:15 AM

Vortrag

Talk

Talks

The problem of generating synthetic data is almost as old as modern research itself. However, with the advent of generative AI, new possibilities for synthesizing tabular data have emerged that go far beyond the capabilities of traditional statistical or rule-based approaches. Most of this new research comes from the ML community, where ML models need to be fed with useful training data. Since...
Go to contribution page
15. Exploring Computational Reproducibility in Jupyter Notebooks: Insights and Challenges

Sheeba Samuel (Friedrich Schiller University)

3/12/24, 12:00 PM

Vortrag

Talk

Talks

Reproducible research emphasizes the importance of documenting and publishing scientific results in a manner that enables others to verify and extend them. In this talk, we explore computational reproducibility within the context of Jupyter notebooks, presenting insights and challenges from our study. We will present the key steps of the pipeline we used for assessing the reproducibility of...
Go to contribution page
11. Research Data Management in TIRA for Reproducible Shared Tasks

Maik Fröbe (U Jena)

3/12/24, 12:30 PM

Vortrag

Talk

Talks

TIRA is a platform to organize shared tasks with software
submissions, mostly in information retrieval and natural language
processing. Due to the software submissions, TIRA allows blinded
experimentation on (confidential) datasets to which participants have no
access. After a shared task, the artifacts of the shared tasks, i.e.,
research data in the form of submitted software, inputs,...
Go to contribution page
1. Bridging the gap between data lakes and RDBMSs - Efficient query processing with Parquet

Alice Rey

Poster

Poster

Poster

In the age of massive data, time-intensive loading phases make databases less viable for data exploration tasks.
Still, the highly optimized query engines of database systems are greatly beneficial for the performance of data analysis tasks.
With our research, we want to bridge this gap and provide paramount analytical performance without the need of static data loading.
Our approach...
Go to contribution page
4. Datenbankherstellerrecht und Datenbankforschung

Prof. Michael Beurskens (Universität Passau), Stefanie Scherzinger (Universität Passau)

Poster

Poster

Poster

Mit diesem Poster stellen wir das Datenbankherstellerrecht vor. Hierbei handelt es sich nicht, wie man aus dem Blickwinkel eines juristischen Laien und Mitglied der Datenbankforschungsgemeinde meinen könnte, um die Rechte bei der Entwicklung einer Datenbankmanagementsoftware, sondern um die Rechte des Herstellers einer Datenbankinstanz. Auch Forschende oder Forschungsinstitutionen werden beim...
Go to contribution page
20. Enabling Semantic Tools for Interdisciplinary Research

Poster

Poster

Poster

Research has become increasingly reliant on extensive data. The integration, sharing and reuse of research data poses a significant challenge, particularly in the context of interdisciplinary collaborative projects. An essential objective for a research infrastructure dedicated to data management is to facilitate efficient data discovery and integration of diverse data sources. This pressing...
Go to contribution page
21. FAIR Assessment Tools: An evaluation of assessment tools of data sets according to the FAIR principles

Poster

Poster

Poster

Since the publication of the FAIR principles in 2016, they have become increasingly important and various tools have been developed to help assess published data with regard to compliance with the FAIR principles. There is a wide range of fair assessment tools currently available, from simple printable PDF checklists to fully automated tools that only require a DOI or URL to perform the...
Go to contribution page
22. Large-Scale Analysis of Heterogeneous Earth Observation Data

Gereon Dusella (DIMA@TU Berlin)

Poster

Poster

Poster

In recent years, the amount of data made available in the earth-observation domain has increased exponentially. In 2022, for example, data released from the observations of eight Sentinel satellites amounted to 6.64 petabytes [2]. Now, researchers all over the world are using these vast amounts of resources to further improve our understanding of the world. In their journey, the researchers...
Go to contribution page
26. Ocient Hyperscale Data Warehousing

Poster

Poster

Poster

tbd
Go to contribution page
18. Reproducibility of Deep Learning pipeline method information using a Multi-modality approach

Poster

Poster

Poster

Scientific publications have enormous amounts of information and serve as the main pillar for advancing knowledge across various disciplines. Recently, many sectors and disciplines have been employing Deep Learning (DL) models due to their popularity. However, manually extracting DL method information from publications is becoming tedious with the ever-growing published literature. On the...
Go to contribution page
17. Revisiting the process of Knowledge Graph generation with the integration of LLMs

Poster

Poster

Poster

In recent years, the advent of Large Language Models (LLMs) has transformed both natural language processing (NLP) and knowledge representation. With vast pre-trained parameters and advanced neural architectures, these models show remarkable results in generating human-like text. In knowledge representation, ontologies serve as fundamental frameworks for organizing and representing knowledge...
Go to contribution page
24. Sitzung der GI Fachgruppe Datenbanken

Meike Klettke

Vortrag

Fachgruppentreffen
12. STRENDA DB – a Web-based Assessment and Storage Tool for Enzymology Data

Poster

Poster

Poster

The STRENDA Commission (STandards for Reporting ENzymology Data, www.beilstein-strenda.org) made up of experts from the enzyme chemistry community and supported by the Beilstein-Institut, has developed the STRENDA Guidelines in tight consultation with the community. The aim is to improve the quality of enzyme function data in the literature. Today, more than 60 biochemical journals already...
Go to contribution page
19. The FAIR data principles from a repository perspective - BEXIS2 status and outlook

Poster

Poster

Poster

With the acceptance of the FAIR Data principles in the research community, the requirements and standards of data publications have changed significantly. While the FAIR principles are explicitly targeted at metadata and digital resources such as APIs, workflows, ontologies, and models, these digital objects can not be made FAIR without supporting infrastructure services that are themselves...
Go to contribution page
27. Towards FAIR Data in Legal Domain

Poster

Poster

In this work, we explore the pivotal role of legal interoperability in facilitating the sharing and reusability of data across diverse domains. In particular, we focus on the challenges within the legal context, delving into issues related to diverse data types, potentially sensitive information, copyright concerns, and licensing intricacies. This work navigates through the complexities of...
Go to contribution page
3. VirJenDB: the comprehensive virus database based in Jena

Noriko Cassman (Friedrich Schiller University)

Poster

Poster

Poster

Recent advances in sequencing technologies have resulted in a deluge of virus sequences, lately consisting mostly of SARS-Cov-2, into the global repositories such as the INSDC[1]:ENA[2] and GISAID[3]. However, many other viruses are under sequenced, and metadata is often lacking which hampers the re-use of the data[4]. As a service and upcoming Use Case of the NFDI4Microbiota consortium[5], we...
Go to contribution page
25. Wikidata as a FAIR and multilingual interface to the research ecosystem

Poster

Poster

The practice of data sharing is slowly but surely reaching further and deeper into scholarly realms, and an increasing share of such data meets at least some of the FAIR Principles. On that basis, it is increasingly possible for research data to be found and used by people and processes with no close relationship to the original research context, which opens up both opportunities and...
Go to contribution page

Choose timezone

Frühjahrstreffen der FG Datenbanken

Contact