V-SLAM on Lie groups (MSc thesis)
This work focuses on leveraging the potential of Lie theory in state estimation to derive a nonlinear approach for solving the Simultaneous Localization And Mapping problem. As a matter of fact, the groups SE(3) and SO(3) have proven to be very convenient in representing body motions in 3D space. Therefore, it becomes possible to design nonlinear observers for solving the SLAM problem using Lyapunov stability analysis. Our contribution consists of endowing the observer with two practical features: a System re-dimensioning feature, to give a vehicle the ability to dynamically change the dimension of the state matrix, and a Fault Detection and Isolation block that detects and corrects faulty mea- surements from the camera and the IMU used to implement the proposed observer.
YOLOngv8: From Imbalanced to Accurate Object Detection in Long Tailed iSAID Dataset
Object detection in aerial images poses unique challenges unlike in regular detection settings. The iSAID dataset, in particular, has a high density of object instances, scale variations, large aspect ratios, and a long tail distribution in the classes. Recent works have shown the need and effectiveness of handling the long-tail distribution of data in the detection pipeline. In this work, we show the capability of using prototype-based supervised contrastive learning, with dynamically adjusted weights to enable better feature representation learning for the tail classes, thus improving the overall performance of object detection on the iSAID dataset. We also incorporate a weighted binary cross entropy loss based on class probability priors to facilitate learning across all classes and adapt the GIoU metric for Non-Maximum Suppression (NMS) for better post-processing. We show that our proposed model YOLOngv8 adapts the YOLOv8 model better for accurate object detection for the long-tailed iSAID dataset.
Robust Model poisoning attack to fake clients defense
We explore the attack surface of federated learning (FL) and focus on model- poisoning attacks using fake clients. We propose a novel attack strategy that leverages fake clients to evade detection by FLdetector, a detection method that leverages gradient inconsistency. The effectiveness of the proposed attack is evaluated by measuring the detection accuracy and its impact on the training process and resulting model accuracy. The experiments show that the proposed attack consistently outperforms the baseline prior to detection in terms of test accuracy, while we also show that the detector identifies good clients as bad clients with high false positive and false negative rates. Further, we demonstrate the scalability of the attack and highlight the need for effective countermeasures in real-world FL scenarios. Overall, our study presents an important contribution to understanding the vulnerabilities and potential consequences of model poisoning attacks using fake clients in FL systems.
Drawing Attention to Detail: Pose Alignment through Self-Attention for Fine-Grained Object Classification
One recent study [4] shows the importance of local parts localization and parts alignment in enhancing the robust- ness against pose variation, and improving the generaliz- ability of the model when trained with optimal order of fine-grained local parts. In their architecture, the parts from a given image are optimally arranged by maximizing the similarity between a correlation matrix of a reference set of parts, and the one generated from the input parts following all possible permutations. Our approach, offers an end- to-end trainable attention-based parts alignment module, where we replace the graph-matching component used in it with a self-attention mechanism. The attention module is able to learn the optimal arrangement of parts while at- tending to each other, before contributing to the global loss.
Human Pose Estimation
Human Pose Estimation (HPE) seeks to find human body components and construct human body representations (e.g., skeletons) from input data such as photographs and videos. It has gained popularity over the last decade and has been used in various applications such as human-computer interaction, entertainment, and virtual reality. Although recently developed deep learning-based solutions have achieved high performance in human pose estimation, challenges remain due to a lack of training data, depth ambiguities, and occlusion. In this paper, we represent our modification of Stacked Hourglass Networks [5], which significantly increases performance by 13% regarding model speed with an increase in accuracy by around 0.7%.
DeepLabV3+ and SegFormer robustness analysis
In this project, we conducted a comparative analysis of Segformer, a ViT-based semantic segmentation model, and DeeplabV3+, a CNN-based semantic segmentation model, using various image perturbations (e.g., image patch shuffling and removal) and noise addition (e.g., salt and pepper noise). Segformer has been proven to be very efficient due to its hierarchical encoder, and the spatial pyramid pooling in DeeplabV3+ allows it to compete with SOTA semantic segmentation models. Previous studies have compared these models’ performances on the segmentation of natural images without any noise; however, the performance of these models on perturbed and noisy images was unknown. Both models were trained on the ADE20k dataset and tested on segmentation of the person class, which alleviates the effect of Segformer pre-training on ImageNet. Results suggest that Segformer can tolerate more perturbation than DeeplabV3+ and can perform well on noisy images; however, both models’ performance drops significantly as we increase the noise.