Analysis of geographic data for environmental monitoring
Geographic datasets are widely used for research and development for environmental monitoring, hydrogeologic risk mapping, land use, and urban planning, among others. However, most available datasets are obtained based on human creation. For example, open datasets are usually obtained through the spontaneous collaboration of users (Volunteer Geographic Information System - VGIS), resulting in the lack of completeness and, in some cases, quality. In other cases, the datasets are created by experts, for example, by on-field expeditions or by the manual screening of remote sensing data. In both cases, this is a very time-consuming task and prone to errors.
In this work, a hybrid approach is proposed, in which geographical and anthropic entities are automatically discovered by analyzing available sources, and the confidence in the automatically extracted entities is improved by submitting the objects with uncertainty over a threshold to a crowdsourcing step. The resulting objects can be provided to the crowd (in VGIS) or experts as a baseline to map (through validation) new entities. In this research project, we aim to explore the use of Deep Learning methods to train a model capable of identifying geographical and anthropic entities with accuracy, at scale, and in a region-independent way, and at developing application(s) to publish uncertain results for validation and later incorporation of the new objects into GIS datasets.
The use case with the most relevance in the study is that of the detection of illegal landfills from aerial images. Early detection is crucial to prevent and alleviate impact and the cost of treatment. Mass-scale territory analysis is hindered by the essentially manual nature of the photo interpretation task, which skilled experts must perform. From the literature review, the illegal detection problem was approached in the past with data-driven methods, but mainly from GIS data or at street-level (e.g., webcams). In this work, the problem is approached as a remote sensing scene classification task using aerial images. A collaboration was established with the Regional Environmental Protection Agency of Lombardy (ARPA) to obtain a robust dataset created by experts. With the provided GT, DL was applied to cope with the task, and validations were performed based on standard metrics and expert’s feedback.
To further evaluate and understand the performance of the models, a tool for evaluating errors in machine learning classification tasks and object detection and instance segmentation was implemented. It uses custom meta-annotation to have a better understanding of the model’s performance and weaknesses.
Back to Alumni