AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery

Hangyu Zhou*Chia-Hsiang Kao*Cheng Perng PhooUtkarsh MallBharath HariharanKavita Bala

Cornell University, Columbia University

* Equal Contribution

In submission to NeurIPS 2024 Datasets and Benchmarks Track

AllClear Image
Location: xxx | Time: xxx
GIF Preview
Sentinel-2 | Sentinel-1 | Landsat-8/9

Please hover over the ROIs of your choice to check out images.

Abstract

Clouds in satellite imagery pose a significant challenge for downstream applications. A major challenge in current cloud removal research is the absence of a comprehensive benchmark and a sufficiently large and diverse training dataset. To address this problem, we introduce the largest public dataset -- AllClear for cloud removal, featuring 23,742 globally distributed regions of interest (ROIs) with diverse land-use patterns, comprising 4 million images in total. Each ROI includes complete temporal captures from the year 2022, with (1) multi-spectral optical imagery from Sentinel-2 and Landsat 8/9, (2) synthetic aperture radar (SAR) imagery from Sentinel-1, and (3) auxiliary remote sensing products such as cloud masks and land cover maps. We validate the effectiveness of our dataset by benchmarking performance, demonstrating the scaling law -- the PSNR rises from 28.47 to 33.87 with 30× more data, and conducting ablation studies on the temporal length and the importance of individual modalities. This dataset aims to provide comprehensive coverage of the Earth's surface and promote better cloud removal results.

Paper

[pdf]   [supplementary pdf]

Hangyu Zhou*, Chia-Hsiang Kao*, Cheng Perng Phoo, Utkarsh Mall, Bharath Hariharan, Kavita Bala. "AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery". Submitted to NeurIPS Datasets and Benchmarks Track, 2024.

Data

[zip] allclear_test_images.zip [23 GB]: AllClear test set (one sample per ROI).

[json] allclear_test_metadata.json: Contains metadata for AllClear test set (one sample per ROI).

Code

Our Code can be found at the GitHub Repo.

Updates:

[06-12-2024] Initial website has been setup.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.