Deep learning-based urban land cover classification using freely available high resolution satellite imagery from Google Earth.
Contrastive Language-Image Pre-Training (CLIP) model with Vision Transformer (ViT-B/32) backbone, fine-tuned on the RSICD dataset, used as a feature extractor.
Self-supervised vision transformer (DINOv2) with Low-Rank Adaptation (LoRA) for efficient fine-tuning.