Data Science Day Jena 2023

Name: Data Science Day Jena 2023
Start: 2023-05-10T13:00:00+02:00
End: 2023-05-10T20:00:00+02:00
Location: Rosensäle der Friedrich-Schiller-Universität Jena

10 May 2023

Rosensäle der Friedrich-Schiller-Universität Jena

Europe/Berlin timezone

Data Science Day Jena

dsdj@uni-jena.de

Unsupervised Anomaly Detection for Space Gardening

Not scheduled

20m

Seminarraum (Rosensäle)

Seminarraum

Rosensäle

Poster session

Abstact

The EDEN Roadmap at DLR aims at building a Bio-regenerative Life Support System (BLSS) for future space missions within the current decade.
To ensure the safe and stable operation of the BLSS, the need for automated system monitoring in general and, in particular, robust anomaly detection is apparent.
While the abundance of available methods makes it difficult to choose the most appropriate method for a specific application, each method has its strengths in detecting anomalies of different types.
The decision becomes even more difficult if annotated data is not available that could be used for model selection.
To address this challenge, we compared six unsupervised anomaly detection methods of varying complexity on the UCR anomaly archive benchmark. The goal was to determine whether more complex methods perform better and if certain methods are better suited to specific anomaly types.
To validate our findings in the BLSS domain, we applied the best-performing methods to telemetry data collected from the EDEN ISS research greenhouse, which operated from 2018 - 2021 in Antarctica.

Introduction

Bio-regenerative Life Support Systems (BLSSs) are utilized in habitats to produce plant-based food and close material cycles for respiratory air, water, biomass, and waste.
The EDEN NEXT GEN Project, part of the EDEN roadmap at DLR, aims to design a fully integrated ground demonstrator of a BLSS that includes all subsystems.
This project builds upon the findings of its predecessor project, EDEN ISS, in which controlled environment agriculture (CEA) technologies were investigated.
EDEN ISS was a near closed-loop system built into two 20-foot ISO containers and deployed to the German Antarctic Station Neumayer III in 2017.
From 2018 to 2021, crop cultivation, such as lettuces, bell peppers, leafy greens, and various herbs, was studied [6].
To ensure the safe and stable operation of the BLSS, we are investigating methods to mitigate risks regarding system health.
Since there is no clear definition of unhealthy system states or sufficient annotated data available for this kind of application, we investigate unsupervised methods for anomaly detection.
Choosing the appropriate method from the plethora of available options for a given application is challenging because different methods have different strengths in detecting certain types of anomalies, and the existence of a universal anomaly detection method is a myth [2].
To address this challenge, we compared six unsupervised anomaly detection methods with varying complexities in [5].
Three of these methods are classical machine learning techniques, while the remaining three are based on deep learning.
Our central questions in this comparison have been: (1) "Is it worthwhile to sacrifice the interpretability of classical methods for potentially superior performance of deep learning methods?" and (2) "What different types of anomalies are the methods capable of detecting?"
We found that the two classical methods, \textit{Maximally Divergent Intervals (MDI)} [1] and \textit{MERLIN} [4], not only performed best, but they also seemed to complement each other in terms of the detected anomaly types [5].
However, as MERLIN suffered from high runtimes, we switched to an improved method for discord discovery called \textit{Discord Aware Matrix Profile (DAMP)} [3].
To validate the results from [5] in the BLSS domain, we are applying MDI and DAMP to a telemetry dataset collected at the EDEN ISS research greenhouse.

Methods

MDI [1] is a density-based method for offline anomaly detection in uni- or multivariate, spatiotemporal data.
Given a multivariate time series $\mathcal{T}$, MDI detects anomalous subsequences by comparing the probability density $p_S$ of a subsequence $S \subseteq \mathcal{T}$ to the density $p_{\Omega(S)}$ of the remaining part of the times series $\Omega(S) := \mathcal{T} \setminus S$ for all subsequences.
For more details on MDI, please refer to [1] and [5].

DAMP [3] is a method for offline and online anomaly detection based on discord discovery:
Given a subsequence $S$ with length $L$ starting at timestamp $p$, a matching subsequence $M$ starting at timestamp $q$ is called a non-self match to $S$ if $|p - q| \geq L$ [4].
The discord $\tilde{S}$ of a time series $\mathcal{T}$ is defined as the subsequence with the largest distance $d(\tilde{S}, M_{\tilde{S}})$ from its nearest non-self match $M_{\tilde{S}}$, where $d(\cdot,\cdot)$ is the z-normalized (zero mean and unit variance) Euclidean distance.
Advantages of DAMP compared to MERLIN are, that DAMP can be applied effectively online and to multivariate data is well.
For details on DAMP, please refer to [3].

Data

To validate the findings from [5] in the BLSS domain, we use a subset of the data, collected in the EDEN ISS research greenhouse.
The dataset consists of eight time series of sensor readings for carbon dioxide ($CO^2$), relative humidity (RH), photosynthetic active radiation (PAR) and temperature (T) for the year 2020.
These variables have been measured at two different places within the greenhouse and belong to the Atmosphere Management Subsystem of EDEN ISS.
Each time series has a sampling rate of one data point every 5 minutes ($0.00\bar{3}$ Hz) and a total length of 105408 data points.

Preliminary Results

As the discord of the time series is a single subsequence, employ a sliding window approach. Both methods are applied iteratively to a 30-day window by shifting it by one day on each iteration.
The score for the newly analyzed day are appended to the score that has been already obtained. The normalized anomaly scores are classified using a threshold of $0.2$.
Figure 1 displays the results for temperature readings T1 and T2, with the time series in blue and orange, the detected anomalies by MDI and DAMP highlighted in red and green respectively, and the obtained anomaly scores shown in with the same color coding in the plots below.

Image is attached
Caption: Sensor readings for T1 (blue) and T2 (orange), MDI anomaly score for T1 and T2 (red) and discords of T1 and T2 as found by DAMP (green).

As there is no ground truth available for anomalies in the EDEN ISS telemetry data, we assess the performance of MDI and DAMP qualitatively.
The results confirm that MDI and DAMP identify different types of anomalies.
While both methods identify \textit{outlier} anomalies, they do not detect the same instances.
Moreover, DAMP identifies \textit{missing drop} anomalies, which manifest as gaps in Figure \ref{fig:temp}, similar to the findings in [5].
DAMP successfully identifies a \textit{change point} anomaly between time points $20600$ and $21753$, which is not detected by the MDI method.
On the other hand, MDI detects a subtle \textit{local drop} anomaly at time point $21120$ that is not identified by DAMP.
These results emphasize the usefulness of utilizing both methods in conjunction with each other for effective anomaly detection.

The results for the other variables are similar to those of the temperature readings.
MDI and MERLIN flag different subsequences as anomalous, which appear reasonable upon visual inspection.
However, the dominant pattern of the time series is not as evident as that of the temperature readings.
Therefore, the correctness of the detected anomalies should be verified by domain experts.

Conclusion & Outlook

Our recent benchmark in [5] indicated that combining MDI with a discord discovery-based anomaly detection method can detect a wide range of different anomalies.
The analysis of telemetry data from the EDEN ISS research greenhouse confirms this finding.
To ensure a more rigorous evaluation of the results, we will collaborate with BLSS domain experts and obtain their feedback on our initial findings.

References

[1]

Björn Barz, Erik Rodner, Yanira Guanche Garcia, Joachim Denzler. "Detecting Regions of Maximal Divergence for Spatio-Temporal Anomaly Detection." IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.

[2]

Nikolay Laptev, Saeed Amizadeh, and Ian Flint. Generic and Scalable Framework for Automated Time-series Anomaly Detection. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). 2015.

[3]

Yue Lu, Renjie Wu, Abdullah Mueen, Maria A. Zuluaga, and Eamonn Keogh. DAMP: accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams. Data Min. Knowl. Discov. 2023.

[4]

Takaaki Nakamura, Makoto Imamura, Ryan Mercer and Eamonn Keogh. "MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives." IEEE International Conference on Data Mining, 2020.

[4]

Takaaki Nakamura, Makoto Imamura, Ryan Mercer and Eamonn Keogh. "MERLIN: Parameter-Free Discovery of Arbitrary Length Anomalies in Massive Time Series Archives." IEEE International Conference on Data Mining, 2020.

[5]

Ferdinand Rewicki, Joachim Denzler, and Julia Niebling. Is It Worth It? Comparing Six Deep and Classical Methods for Unsupervised Anomaly Detection in Time Series. Applied Sciences 13. 2023.

[6]

Zabel, Paul, et al. "Biomass production of the EDEN ISS space greenhouse in Antarctica during the 2018 experiment phase." Frontiers in plant science 11. 2020.

Ferdinand Rewicki (DLR - Deutsches Zentrum für Luft- und Raumfahrt, Friedrich Schiller Univerität Jena) Joachim Denzler Julia Niebling (DLR - Deutsches Zentrum für Luft- und Raumfahrt)

There are no materials yet.

Data Science Day Jena 2023