官网链接:https://cvpr2022.thecvf.com/
开会时间:2022年6月19日-6月24日
❣❣❣近日,CVPR 2022 接收论文公布! 总计2067篇!,部分预印版论文也陆续发布中,本文档也将持续收录更新,多多关注!!
❣❣❣另外打包下载所有论文,可在【我爱计算机视觉】微信公众号后台回复“paper”。截止6月13日,已收录 733+2 篇。
历年综述论文分类汇总戳这里↘️ CV-Surveys施工中~~~~~~~~~~
- 目标计数
- 三维网格重建
- 视频预测
- SR
- 图像矢量化
- 模型压缩
- 动作检测
- 数据集
- 多标签分类
- VOS
图像动画
- Thin-Plate Spline Motion Model for Image Animation
- 人物动画
- 3D character animation(三维角色动画)
- 3D 舞蹈生成
- Learning Motion-Dependent Appearance for High-Fidelity Rendering of Dynamic Humans from a Single Camera
- IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images
😮oral🏠project - SqueezeNeRF: Further factorized FastNeRF for memory-efficient inference
- Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction
⭐code - Modeling Indirect Illumination for Inverse Rendering
⭐code🏠project - GenDR: A Generalized Differentiable Renderer
⭐code
泛化可微渲染器 - CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
⭐code🏠project - NeRF-Editing: Geometry Editing of Neural Radiance Fields
- 声源定位
- NPBG++: Accelerating Neural Point-Based Graphics
🏠project - AutoRF: Learning 3D Object Radiance Fields from Single View Observations
🏠project - NeurMiPs: Neural Mixture of Planar Experts for View Synthesis
⭐code🏠project📺video📰解读
- ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
⭐code🏠project📰粗解 - Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
- 3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos
- Hephaestus: A large scale multitask dataset towards InSAR understanding
- SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis
🌻dataset - AKB-48: A Real-World Articulated Object Knowledge Base
⭐code
📰粗解 - Primitive3D: 3D Object Dataset Synthesis from Randomly Assembled Primitives
- 卫星数据集
- 动物行为理解数据集
- Motron: Multimodal Probabilistic Human Motion Forecasting
- Progressively Generating Better Initial Guesses Towards Next Stages for High-Quality Human Motion Prediction
- Light Field(光场)
- 深度重建
- 快门校正
- 热红外成像
- Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection
⭐code - Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
⭐code
- Multi-View Transformer for 3D Visual Grounding
⭐code - Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
⭐code
视觉定位,通过自然语言定位目标位置 (很有意思的研究)
- 小样本
- 零样本
- 域泛化
- Compound Domain Generalization via Meta-Knowledge Encoding
- Causality Inspired Representation Learning for Domain Generalization
- Towards Unsupervised Domain Generalization
本次任务的主要目标是域泛化(domain generalization(DG)),是首篇将DG推广到unsupervised learning 领域的,并提出一个新的研究领域 unsupervised domain generalization(UDG)。 - 域外泛化
- 域适应
- HL-Net: Heterophily Learning Network for Scene Graph Generatio
⭐code
场景图生成:异质学习网络
📰解读 - RU-Net: Regularized Unrolling Network for Scene Graph Generation
⭐code
场景图生成:正则展开网络
📰解读 - The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
⭐code
- CD2-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning
- Auditing Privacy Defenses in Federated Learning via Generative Gradient Leakage
⭐code - FedCorr: Multi-Stage Federated Learning for Label Noise Correction
⭐code - Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning
- Controllable Dynamic Multi-Task Architectures
- Task Adaptive Parameter Sharing for Multi-Task Learning
- 增量学习
- 类增量学习
- Give Me Your Attention: Dot-Product Attention Considered Harmful for Adversarial Patch Robustness
- Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network
- Towards Practical Certifiable Patch Defense with Vision Transformer
📰解读 - 对抗样本
- 对抗攻击
- 黑盒
- 对抗训练
- On Generalizing Beyond Domains in Cross-Domain Continual Learning
- Probing Representation Forgetting in Supervised and Unsupervised Continual Learning
- Online Continual Learning on a Contaminated Data Stream with Blurry Task Boundaries
⭐code
- What Matters For Meta-Learning Vision Regression Tasks?
- Multidimensional Belief Quantification for Label-Efficient Meta-Learning
- Dynamic Kernel Selection for Improved Generalization and Memory Efficiency in Meta-learning
- Selective-Supervised Contrastive Learning with Noisy Labels
⭐code📰粗解 - Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning
⭐code - Cam-Ready: UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning
- Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework
⭐code - Crafting Better Contrastive Views for Siamese Representation Learning
😮oral⭐code
- CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow
⭐code - DIP: Deep Inverse Patchmatch for High-Resolution Optical Flow
- Imposing Consistency for Optical Flow Estimation
- Deep Equilibrium Optical Flow Estimation
⭐code📰解读 - GMFlow: Learning Optical Flow via Global Matching
😮oral⭐code📰解读
- 场景文本检测
- Text Spotting
- LOGO设计
- 字体生成
- 文本识别
- 表格结构识别
- 知识蒸馏
- Knowledge Distillation with the Reused Teacher Classifier
- DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
📰解读 - Decoupled Knowledge Distillation
⭐code
📰解耦知识蒸馏,让Hinton在7年前提出的方法重回SOTA行列 - Knowledge Distillation via the Target-aware Transformer
😮oral
📰RMIT&阿里&UTS&中山提出Target-aware Transformer,进行one-to-all知识蒸馏!性能SOTA - Evaluation-oriented Knowledge Distillation for Deep Face Recognition
😮oral⭐code📰解读
- 模型压缩
- 剪枝
- 量化
- HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction
- MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
- GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
⭐code - OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction
⭐code
📰粗解 - D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions
🏠code - What to look at and where: Semantic and Spatial Refined Transformer for detecting human-object interactions
😮oral - Human-Object Interaction Detection via Disentangled Transformer
- Consistency Learning via Decoding Path Augmentation for Transformers in Human Object Interaction Detection
⭐code📰解读 - Interactiveness Field in Human-Object Interactions
⭐code - Stability-driven Contact Reconstruction From Monocular Color Images
⭐code
单目彩色图像的手物交互重建,人机交互 - Interactiveness Field of Human-Object Interactions
⭐code
📰粗解 - HOI跟踪
- 🐦️AlignMix: Improving representation by interpolating aligned features
- 3D Common Corruptions and Data Augmentation
⭐code🏠project📺video📰粗解 - Kubric: A scalable dataset generator
- Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships
- VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers
- Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
🌻dataset - Robust Cross-Modal Representation Learning with Progressive Self-Distillation
- Prompt Distribution Learning
在下游的识别任务中,作者提出的方法在12个数据集上均展示出了一致性的性能提升。 - Vision-Language Pre-Training with Triple Contrastive Learning
⭐code - Improving features Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
⭐code
📰国科大&港中文提出带视觉语言验证和迭代推理的Visual Grounding框架,性能SOTA,代码已开源! - VLN
- VQA
- AVQA
- Video-QA
- 目标导航
- try-on
- AR
- Episodic Memory Question Answering
😮oral⭐code
AI助理:情景记忆问答 (增强现实新任务,数据及代码均将开源)
- Episodic Memory Question Answering
- 机器人
- Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
⭐code - Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation
⭐code - 运动风格迁移
- 运动迁移
- 场景风格化
- OSOP: A Multi-Stage One Shot Object Pose Estimation Framework
- OnePose: One-Shot Object Pose Estimation without CAD Models
⭐code🏠project📰解读 - 4D
- 9D
- 单目目标姿势估计
- 6D
- RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization
- FS6D: Few-Shot 6D Pose Estimation of Novel Objects
⭐code🏠project📰解读 - Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation
- ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework
⭐code - Focal Length and Object Pose Estimation via Render and Compare
⭐code🏠project📰解读 - DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose Estimation
⭐code🏠project📰解读 - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation
⭐code📰解读
- 3D Object Articulation
- 3Dope
- GNN
- 细粒度分类
- 图像分类
- 小样本分类
- CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification
- Matching Feature Sets for Few-Shot Image Classification
⭐code🏠project📺video - Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification
😮oral⭐code🏠project📰解读 - Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
📰解读 - Generating Representative Samples for Few-Shot Classification
⭐code
📰粗解
在小样本分类问题中,通过生成更多代表性样本,去除非代表性样本,改善了分类结果。实现了SOTA的结果。 - 小样本分类与分割(FS-CS)
- 长尾识别
- 细粒度识别
- 多标签分类
- Learning Graph Regularisation for Guided Super-Resolution
- Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites
⭐code🏠project📰解读 - Deep Constrained Least Squares for Blind Image Super-Resolution
⭐code📰解读
- Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval
⭐code - Correlation Verification for Image Retrieval
😮oral⭐code - Sketch3T: Test-Time Training for Zero-Shot SBIR
- Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image
- 文本-视频检索
- 跨模太检索
- Interactive Image Synthesis with Panoptic Layout Generation
- Autoregressive Image Generation using Residual Quantization
⭐code📰粗解 - GIRAFFE HD: A High-Resolution 3D-aware Generative Model
- Arbitrary-Scale Image Synthesis
⭐code📰粗解 - Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis
⭐code📰解读 - Learning to Memorize Feature Hallucination for One-Shot Image Generation
📰解读 - 文本引导的图像处理
- 姿势引导的图像合成
- 文本到图像合成
- 图像翻译
- 图像生成
- 遥感图像融合
- 航空图像分割
- 自动驾驶
- 车道线检测
- 车道线描述
- 行为预测
- 自动驾驶场景重新照明
- 🐦️ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
- Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search
⭐code📰解读 - GPUNet: Searching the Deployable Convolution Neural Networks for GPUs
神经架构搜索,面向GPUs部署的轻量级网络结构搜索 (比谷歌EfficientNet-X系列、Meta FBNetV3 速度更快,甚至性能都要好,作者来自英伟达)
- Reid
- Part-based Pseudo Label Refinement for Unsupervised Person Re-identification
⭐code - Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification
- Large-Scale Pre-training for Person Re-identification with Noisy Labels
⭐code - Cloning Outfits from Real-World Images to 3D Characters for Generalizable Person Re-Identification
⭐code - Implicit Sample Extension for Unsupervised Person Re-Identification
⭐code📰解读 - Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification
⭐code - NFormer: Robust Person Re-identification with Neighbor Transformer
⭐code📰解读 - Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification
- 换装行人重识别
- 遮挡行人重识别
- Part-based Pseudo Label Refinement for Unsupervised Person Re-identification
- 人群计数
- 行人检测
- 步态识别
- Person Search
- Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations
- BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
- DeepLIIF: An Online Platform for Quantification of Clinical Pathology Slides
- DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
⭐code📰解读 - Surpassing the Human Accuracy: Detecting Gallbladder Cancer from USG Images with Curriculum Learning
⭐code🏠project - 3D生物打印
- Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
利用伤口分割和重建生成3D生物打印贴片来治疗糖尿病足溃疡
- Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers
- SR(MRI)
- 医学图像配准
- 医学图像分析
- 自动生成报告
- 自监督
- 半监督
- 弱监督
- Fast Point Transformer
- ChiTransformer:Towards Reliable Stereo from Cues
- Beyond Fixation: Dynamic Window Visual Transformer
- Training-free Transformer Architecture Search
📰解读 - Automated Progressive Learning for Efficient Training of Vision Transformers
⭐code - Collaborative Transformers for Grounded Situation Recognition
⭐code - TubeDETR: Spatio-Temporal Video Grounding with Transformers
😮oral⭐code🏠project - Deformable Video Transformer
- MixFormer: Mixing Features across Windows and Dimensions
😮oral⭐code📰粗解 - Are Multimodal Transformers Robust to Missing Modality?
- MiniViT: Compressing Vision Transformers with Weight Multiplexing
- Multimodal Token Fusion for Vision Transformers
- Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
😮oral⭐code📰解读 - UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog
对比学习用于视觉对话的统一Transformer架构 - Patch Slimming for Efficient Vision Transformers
📰解读 - Swin Transformer V2: Scaling Up Capacity and Resolution
⭐code
📰大大刷新记录!Swin Transformer v2.0 来了,30亿参数! - SimMIM: A Simple Framework for Masked Image Modeling
⭐code - NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition
⭐code
📰解读 - Mobile-Former: Bridging MobileNet and Transformer
- MulT: An End-to-End Multitask Learning Transformer
- Scaling Vision Transformers to Gigapixel Images via Hierarchical Self-Supervised Learning
😮oral⭐code📰解读 - 形状补全
- Stochastic Backpropagation: A Memory Efficient Strategy for Training Video Models
😮oral - 动作分割
- 动作理解
- Video Copy Detection(视频拷贝检测)
- 视频合成
- 视频异常检测
- 视频监控
- 视频时刻检索和视频高光检测
- 视频时刻检索
- 视频预测
- 视频个体计数
- 视频插值
- Many-to-many Splatting for Efficient Video Frame Interpolation
⭐code - TimeReplayer: Unlocking the Potential of Event Cameras for Video Interpolation
- Long-term Video Frame Interpolation via Feature Propagation
- Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion
- Many-to-many Splatting for Efficient Video Frame Interpolation
- 视觉对应(视频)
- 视频识别
- 视频分类
- 视频预测
- 视频分割
- Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
⭐code - VOS
- 视频实例分割(VIS)
- 视频语义分割
- 视频全景分割
- Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
- 视频影像处理
- 视频超分辨率
- Reference-based Video Super-Resolution Using Multi-Camera Video Triplets
- Learning Trajectory-Aware Transformer for Video Super-Resolution
😮oral⭐code - Investigating Tradeoffs in Real-World Video Super-Resolution
⭐code📰解读 - BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
⭐code🏠project📺video
🏆NTIRE 2021年视频修复和增强挑战赛冠军 - Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
📰ETDM:基于显式时间差分建模的视频超分辨率 - Memory-Augmented Non-Local Attention for Video Super-Resolution
📰解读
- 视频恢复
- 视频修复
- 视频去摩尔纹
- 视频去模糊
- 视频去噪
- 电影修复
- 视频超分辨率
- 视频表征学习
- 视频分解
- 视频阴影检测
- 视频帧插值
- VSS
- VSR
- 视频重建
- 视频理解
- 🐦️HyperInverter: Improving StyleGAN Inversion via Hypernetwork
🏠project - InsetGAN for Full-Body Image Generation
🏠project
📰1024x1024 分辨率,效果惊人!InsetGAN:全身图像生成 - Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data
⭐code - Deep Image-based Illumination Harmonization
- GAN-Supervised Dense Visual Alignment
😮oral⭐code🏠project📺video
📰CVPR2022 Oral:GAN监督的密集视觉对齐,代码开源 - HairMapper: Removing Hair from Portraits Using GANs
⭐code - Polymorphic-GAN: Generating Aligned Samples across Multiple Domains with Learned Morph Maps
😮oral🏠project - 图像篡改检测
- 头发编辑
- Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks
- Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation
- InstaFormer: Instance-Aware Image-to-Image Translation with Transformer
- Unsupervised Image-to-Image Translation with Generative Prior
⭐code🏠project📺video
- Protecting Celebrities with Identity Consistency Transformer
- Deepfake
- 妆容迁移
- 人脸识别
- 人脸表情识别
- 3D人脸
- 活体检测
- 假脸检测
- 人脸交换
- 人脸属性分类
- Face Relighting(人脸重照光)
- 人脸编辑
- 人脸幻构
- Deepfake检测
- 人脸重建
- 人脸捕捉
- 换头
- 人像畸变矫正
- 3D人脸建模
- 人脸修复
- Disentangled3D: Learning a 3D Generative Model with Disentangled Geometry and Appearance from Monocular Images
- Depth-Guided Sparse Structure-from-Motion for Movies and TV Shows
⭐code - 3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection
😮oral⭐code📰解读 - Stereo Merging
- PSMNet: Position-aware Stereo Merging Network for Room Layout Estimation
- GraftNet: Towards Domain Generalized Stereo Matching with a Broad-Spectrum and Task-Oriented Feature
⭐code - Degradation-agnostic Correspondence from Resolution-asymmetric Stereo
- Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation
😮oral⭐code📰解读
- 深度估计
- OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
- NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
- 🐦️Toward Practical Self-Supervised Monocular Indoor Depth Estimation
- P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
⭐code - HiMODE: A Hybrid Monocular Omnidirectional Depth Estimation Model
- Multi-Frame Self-Supervised Depth with Transformers
- Layered Depth Refinement with Mask Guidance
🏠project
- 房间布局
- MVS
- 三维重建
- PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo
- Self-supervised Neural Articulated Shape and Appearance Models
🏠project - BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion
- Topologically-Aware Deformation Fields for Single-View 3D Reconstruction
⭐code🏠project - Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction
⭐code🏠project📰解读 - What's in your hands? 3D Reconstruction of Generic Objects in Hands
⭐code🏠project📺video📰解读 - Surface Reconstruction from Point Clouds by Learning Predictive Context Priors
⭐code - FvOR: Robust Joint Shape and Pose Optimization for Few-view Object Reconstruction
⭐code
📰解读 - 三维场景重建
- 手物重建
- 三维服装网格重建
- 三维网格重建
- 三维形状重建
- 三维服装变形
- 纹理迁移与合成
- 形状匹配
- 表面重建
- COAP: Compositional Articulated Occupancy of People
⭐code🏠project📺video📰解读 - Context-Aware Sequence Alignment using 4D Skeletal Augmentation
😮oral⭐code🏠project - 多人姿态估计
- 基于视频的HPE
- 3D pose
- MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
- PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision
😮oral⭐code - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation
- Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
📰精准高效估计多人3D姿态,美图&北航分布感知式单阶段模型
- 4D 人体捕获
- 手势生成
- 3D手网格估计
- 3D形状生成
- 运动捕捉
- 手臂-手部动态估计
- 3D手重建
- 3D人体形状
- Dense correspondence
- 3D人体运动重建
- 三维人体姿态重建
- 动作检测
- Colar: Effective and Efficient Online Action Detection by Consulting Exemplars
- Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos
- End-to-End Semi-Supervised Learning for Video Action Detection
- SPAct: Self-supervised Privacy Preservation for Action Recognition
⭐code - Temporal Alignment Networks for Long-term Video
😮oral⭐code🏠project📰粗解 - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition
- 零样本动作识别
- 小样本动作识别
- 时序动作检测
- 时序动作定位
- Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
⭐code📰粗解 - Unsupervised Pre-training for Temporal Action Localization Tasks
⭐code - ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
⭐code - Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
⭐code - Structured Attention Composition for Temporal Action Localization
⭐code
- Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation
- 重复动作计数
- 组动作识别
- 动作质量评估
- Shape-invariant 3D Adversarial Point Clouds
⭐code - AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception
- REGTR: End-to-end Point Cloud Correspondences with Transformers
⭐code - Equivariant Point Cloud Analysis via Learning Orientations for Message Passing
⭐code - Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
- Deformation and Correspondence Aware Unsupervised Synthetic-to-Real Scene Flow Estimation for Point Clouds
⭐code - Self-Supervised Arbitrary-Scale Point Clouds Upsampling via Implicit Neural Representation
⭐code📰解读 - 3DeformRS: Certifying Spatial Deformations on Point Clouds
⭐code - Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors
⭐code📰解读 - Density-preserving Deep Point Cloud Compression
⭐code🏠project📰解读 - Surface Representation for Point Clouds
😮oral⭐code
📰解读 - 3D 点云
- CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
⭐code📰粗解
CrossPoint,一个用于 3D 点云表征学习的简单自监督学习框架。虽然该方法是在合成的三维物体数据集上训练的,但在下游任务中的实验结果,如三维物体分类和三维物体部分分割,在合成和真实世界的数据集中都证明了该方法在学习可迁移表征方面的有效性。 - A Unified Query-based Paradigm for Point Cloud Understanding
- WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation
⭐code - 3D点云分割
- CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
- 点云分类
- 点云配准
- 点云补全
- 点云分割
- 场景流估计
- TCTrack: Temporal Contexts for Aerial Tracking
⭐code📰粗解
📰TCTrack: 用于空中跟踪的时序信息框架 - Correlation-Aware Deep Tracking
- Global Tracking Transformers
⭐code - Unified Transformer Tracker for Object Tracking
⭐code - Global Tracking via Ensemble of Local Trackers
- Unsupervised Learning of Accurate Siamese Tracking
⭐code - Transformer Tracking with Cyclic Shifting Window Attention
⭐code
Transformer 跟踪:循环为一窗口注意力模型。该算法在五个数据集VOT2020, UAV123, LaSOT, TrackingNet, GOT-10k上均实现了新的SOTA. - Cannot See the Forest for the Trees: Aggregating Multiple Viewpoints to Better Classify Objects in Videos
⭐code - 3D 目标跟踪
- 多目标跟踪
- RGB-T跟踪
- 视觉跟踪
- DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
⭐code📰粗解 - Overcoming Catastrophic Forgetting in Incremental Object Detection via Elastic Response Distillation
⭐code - Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection
以往目标检测往往以目标包围框作为标注训练,作者引入语言提示信息,提炼语言知识到目标检测模型中,获得了1.6~2.1%的性能增益。 - Dynamic Sparse R-CNN
- Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild
⭐code📰粗解 - Focal and Global Knowledge Distillation for Detectors
⭐code📰解读
关于目标检测的知识蒸馏工作,只需要30行代码就可以在 anchor-base, anchor-free 的单阶段、两阶段各种检测器上稳定涨点,现在代码已经开源。 - Group R-CNN for Weakly Semi-supervised Object Detection with Points
⭐code
📰解读 - Real-time Object Detection for Streaming Perception
⭐code📰解读 - Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition
- Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model
⭐code - Optimal Correction Cost for Object Detection Evaluation
- Expanding Low-Density Latent Regions for Open-Set Object Detection
⭐code
📰解读 - SIOD: Single Instance Annotated Per Category Per Image for Object Detection
📰解读 - Task-specific Inconsistency Alignment for Domain Adaptive Object Detection
⭐code - Zero-Query Transfer Attacks on Context-Aware Object Detectors
- AdaMixer: A Fast-Converging Query-Based Object Detector
😮oral⭐code - Learning to Detect Mobile Objects from LiDAR Scans Without Labels
⭐code - Forecasting from LiDAR via Future Object Detection
⭐code - Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection
😮oral - Multi-Granularity Alignment Domain Adaptation for Object Detection
- Proper Reuse of Image Classification Features Improves Object Detection
⭐code - R(Det)^2: Randomized Decision Routing for Object Detection
- Towards Robust Adaptive Object Detection under Noisy Annotations
⭐code - Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint
- Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection
- Interactive Segmentation and Visualization for Tiny Objects in Multi-megapixel Images
⭐code - Cross Domain Object Detection by Target-Perceived Dual Branch Distillation
⭐code
跨域目标检测:目标感知双分支蒸馏 - Progressive End-to-End Object Detection in Crowded Scenes
⭐code
📰解读 - HCSC: Hierarchical Contrastive Selective Coding
⭐code
📰CNN自监督预训练新SOTA:上交、Mila、字节联合提出具有层级结构的图像表征自学习新框架 - Recurrent Glimpse-based Decoder for Detection with Transformer
😮oral⭐code
📰解读 - 小样本目标检测
- 目标定位
- 3D目标检测
- A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation
- Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving
⭐code📰粗解 - Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
🏠project - Point2Seq: Detecting 3D Objects as Sequences
⭐code - MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection
⭐code - LiDAR Snowfall Simulation for Robust 3D Object Detection
😮oral⭐code - CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection
- Homography Loss for Monocular 3D Object Detection
- HyperDet3D: Learning a Scene-conditioned 3D Object Detector
- DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
⭐code - OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
⭐code - Focal Sparse Convolutional Networks for 3D Object Detection
😮oral⭐code📰解读📓 - Rotationally Equivariant 3D Object Detection
🏠project - Bridged Transformer for Vision and Point Cloud 3D Object Detection
📰解读 - Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion
📰解读 - VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
⭐code
📰华南理工提出VISTA:双跨视角空间注意力机制实现3D目标检测SOTA,即插即用 - Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection
😮oral
- 伪装目标检测
- 全监督目标检测
- 半监督目标检测
- 显著目标检测
- Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
⭐code📰解读
📰超高分辨率显著目标检测,新颖高效的错层嫁接架构PGNet(CVPR2022) - Learning from Pixel-Level Noisy Label : A New Perspective for Light Field Saliency Detection
⭐code📰解读 - Bi-directional Object-context Prioritization Learning for Saliency Ranking
⭐code
- Pyramid Grafting Network for One-Stage High Resolution Saliency Detection
- 关键点检测
- Affordance grounding
- 图像对齐
- 物体属性识别
- X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
- Quantifying Societal Bias Amplification in Image Captioning
- NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
- It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection
⭐code🏠project - Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning
- Novel Object Captioning
- 图像恢复
- 图像修复
- 图像拼接
- 运动去模糊
- image outpainting
- 图像美学评估
- 图像质量评估
- 图像去雨
- 图像去模糊
- 图像去噪
- De-rendering
- 图像增强
- 图像和谐化
- 图像超级补全
- Scene Graph Expansion for Semantics-Guided Image Outpainting
该文解决了一个非常有意思的问题,通过对图像场景图的扩展,对图像边缘以外的内容进行语义引导的内容生成,可帮助设计师快速绘就自然和谐的图像扩展内容。
- Scene Graph Expansion for Semantics-Guided Image Outpainting
- 语义图像匹配
- FocalClick: Towards Practical Interactive Image Segmentation
⭐code📰粗解 - Semantic-Aware Domain Generalized Segmentation
😮oral⭐code - ReSTR: Convolution-free Referring Image Segmentation Using Transformers
- Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
🏠project
全景神经场:谷歌新提出的语义级目标感知的神经场景表示模型。该表示模型可以有效地用于新视图合成、2D 全景分割、3D 场景编辑和多视图深度预测等多项任务。相信这又会是一个引领潮流的新方向。 - 实例分割
- E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
⭐code📰粗解 - Sparse Instance Activation for Real-Time Instance Segmentation
⭐code - SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation
🏠project - Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
⭐code🏠project - DArch: Dental Arch Prior-assisted 3D Tooth Instance Segmentation
- Relieving Long-tailed Instance Segmentation via Pairwise Class Balance
⭐code📰解读 - ContrastMask: Contrastive Learning to Segment Every Thing
📰解读
基于像素级对比学习的不完全监督实例分割算法 - 半监督实例分割
- 3D 实例分割
- 🐦️FreeSOLO: Learning to Segment Objects without Annotations
- E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
- 语义分割
- Pin the Memory: Learning to Generalize Semantic Segmentation
⭐code📰解读 - Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation
⭐code📰解读 - GroupViT: Semantic Segmentation Emerges from Text Supervision
⭐code🏠project📺video
📰做语义分割不用任何像素标签,UCSD、英伟达在ViT中加入分组模块 - Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation
⭐code📰粗解 - Deep Hierarchical Semantic Segmentation
⭐code - Semantic Segmentation by Early Region Proxy
⭐code📰粗解 - SimT: Handling Open-set Noise for Domain Adaptive Semantic Segmentation
⭐code - Rethinking Semantic Segmentation: A Prototype View
😮oral⭐code - On the Road to Online Adaptation for Semantic Image Segmentation
- Threshold Matters in WSSS: Manipulating the Activation for the Robust and Accurate Segmentation Model Against Thresholds
- NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night
⭐code📰解读 - TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
- Cross-Image Relational Knowledge Distillation for Semantic Segmentation
⭐code📰解读 - Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
- Unsupervised Hierarchical Semantic Segmentation with Multiview Cosegmentation and Clustering Transformers
⭐code - Self-Supervised Learning of Object Parts for Semantic Segmentation
- Cross-view Transformers for real-time Map-view Semantic Segmentation
😮oral⭐code - Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
🏠project - Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
⭐code📰解读 - 3D分割
- 弱监督语义分割
- Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
⭐code📰粗解 - Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation
⭐code - Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
⭐code - Cross Language Image Matching for Weakly Supervised Semantic Segmentation
⭐code - Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
⭐code - Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
⭐code📰解读 - Weakly Supervised Semantic Segmentation using Out-of-Distribution Data
⭐code📰粗解 - L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation
⭐code
- Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
- 半监督语义分割
- 域适应语义分割
- 域泛化语义分割
- 小样本语义分割
- Pin the Memory: Learning to Generalize Semantic Segmentation
- 动作分割
- 场景解析
- 雾景分割
- 全景分割
- 抠图
- Learning to Anticipate Future with Dynamic Context Removal
⭐code📰粗解 - Learning Optimal K-space Acquisition and Reconstruction using Physics-Informed Neural Networks
- Instance-wise Occlusion and Depth Orders in Natural Scenes
- IFOR: Iterative Flow Minimization for Robotic Object Rearrangement
🏠project - PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence
⭐code🏠project📺video📰粗解 - CAFE: Learning to Condense Dataset by Aligning Features
⭐code📰粗解 - Enhancing Adversarial Robustness for Deep Metric Learning
- BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning
⭐code📰粗解📓 - ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
⭐code📰粗解 - Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values
⭐code - Do Explanations Explain? Model Knows Best
⭐code - HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging
- E-CIR: Event-Enhanced Continuous Intensity Recovery
⭐code - 🐦️Transferability Estimation using Bhattacharyya Class Separability
- Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks
⭐code - GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction
⭐code - Differentially Private Federated Learning with Local Regularization and Sparsification
- Towards Efficient and Scalable Sharpness-Aware Minimization
- DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
- Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences
⭐code📰粗解 - Dynamic Dual-Output Diffusion Models
- Moving Window Regression: A Novel Approach to Ordinal Regression
- Egocentric Prediction of Action Target in 3D
- Compositional Temporal Grounding
with Structured Variational Cross-Graph Correspondence Learning
⭐code - Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction
⭐code - Neural Reflectance for Shape Recovery with Shadow Handling
⭐code - DyRep: Bootstrapping Training with Dynamic Re-parameterization
⭐code - Enhancing Classifier Conservativeness and Robustness by Polynomiality
- Versatile Multi-Modal Pre-Training for Human-Centric Perception
⭐code - Attributable Visual Similarity Learning
⭐code - Optimizing Elimination Templates by Greedy Parameter Search
- Partially Does It: Towards Scene-Level FG-SBIR with Partial Input
- Bi-level Doubly Variational Learning for Energy-based Latent Variable Models
- Brain-inspired Multilayer Perceptron with Spiking Neurons
- ARCS: Accurate Rotation and Correspondence Search
⭐code - iPLAN: Interactive and Procedural Layout Planning
- HINT: Hierarchical Neuron Concept Explainer
⭐code - Visual Abductive Reasoning
⭐code - A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration
⭐code - Learning Structured Gaussians to Approximate Deep Ensembles
- Self-Supervised Image Representation Learning with Geometric Set Consistency
- Balanced Multimodal Learning via On-the-fly Gradient Modulation
😮oral⭐code - CNN Filter DB: An Empirical Investigation of Trained Convolutional Filters
⭐code - Eigencontours: Novel Contour Descriptors Based on Low-Rank Approximation
😮oral - Pop-Out Motion: 3D-Aware Image Deformation via Learning the Shape Laplacian
- Long-term Visual Map Sparsification with Heterogeneous GNN
- Clean Implicit 3D Structure from Noisy 2D STEM Images
- Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets
- CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism
⭐code🏠project - Fast Light-Weight Near-Field Photometric Stereo
- Fast, Accurate and Memory-Efficient Partial Permutation Synchronization
- Multi-Robot Active Mapping via Neural Bipartite Graph Matching
- Learning Program Representations for Food Images and Cooking Recipes
😮oral - Iterative Deep Homography Estimation
⭐code - Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain
- Generating High Fidelity Data from Low-density Regions using Diffusion Models
- Continuous Scene Representations for Embodied AI
⭐code🏠project - It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher
- End-to-End Trajectory Distribution Prediction Based on Occupancy Grid Maps
- Reflection and Rotation Symmetry Detection via Equivariant Learning
- Exploiting Explainable Metrics for Augmented SGD
- On the Importance of Asymmetry for Siamese Representation Learning
⭐code - Unimodal-Concentrated Loss: Fully Adaptive Label Distribution Learning for Ordinal Regression
- Perception Prioritized Training of Diffusion Models
⭐code - LASER: LAtent SpacE Rendering for 2D Visual Localization
😮oral - Efficient Maximal Coding Rate Reduction by Variational Forms
- Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network
- Progressive Minimal Path Method with Embedded CNN
- Online Convolutional Re-parameterization
⭐code - Consistency driven Sequential Transformers Attention Model for Partially Observable Scenes
- Leveraging Equivariant Features for Absolute Pose Regression
- Neural Convolutional Surfaces
🏠project - GLASS: Geometric Latent Augmentation for Shape Spaces
⭐code🏠project - Total Variation Optimization Layers for Computer Vision
- Identifying Ambiguous Similarity Conditions via Semantic Matching
⭐code📰解读 - TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates
- Gravitationally Lensed Black Hole Emission Tomography
⭐code🏠project📺video - Robust and Accurate Superquadric Recovery: a Probabilistic Approach
⭐code - Projective Manifold Gradient Layer for Deep Rotation Regression
⭐code - Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale
⭐code - Single-Photon Structured Light
- Explaining Deep Convolutional Neural Networks via Latent Visual-Semantic Filter Attention
😮oral⭐code - Defensive Patches for Robust Recognition in the Physical World
⭐code📰解读 - Event-aided Direct Sparse Odometry
😮oral⭐code🏠project📺video - Deep Unlearning via Randomized Conditionally Independent Hessians
⭐code - Learning to Imagine: Diversify Memory for Incremental Learning using Unlabeled Data
- Towards Data-Free Model Stealing in a Hard Label Setting
⭐code🏠project - Proto2Proto: Can you recognize the car, the way I do?
⭐code - Balanced MSE for Imbalanced Visual Regression
😮oral⭐code
📰CVPR 2022 (Oral) | 回归标签不平衡? 试试Balanced MSE - Leveraging Unlabeled Data for Sketch-based Understanding
- Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction
⭐code🏠project - Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
⭐code📰解读 - RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
⭐code📰解读 - An Image Patch is a Wave: Quantum Inspired Vision MLP
😮oral⭐code - A ConvNet for the 2020s
⭐code - NeuralHDHair: Automatic High-fidelity Hair Modeling from a Single Image Using Implicit Neural Representations
头发建模:仅用一幅图像,构建高保真度的头发模型,使用隐式神经表示的方法。出自浙大CAD&CG组、ETH Zurich、香港城市大学。 - A Unified Framework for Implicit Sinkhorn Differentiation
⭐code
📰解读 - Towards Better Understanding Attribution Methods
⭐code - Universal Photometric Stereo Network using Global Lighting Contexts
⭐code🏠project📺video📰解读
-
相机重定位
- ❌[SceneSqueezer: Learning to Compress Scene for Camera Relocalization]
😮oral
- ❌[SceneSqueezer: Learning to Compress Scene for Camera Relocalization]
-
相机成像
- ❌[Learning to Zoom Inside Camera Imaging Pipeline]
- ❌[Learning to Zoom Inside Camera Imaging Pipeline]
-
Homography Estimation(旋转估计)
-
3D人体重建
-
图像字幕
- ❌[Comprehending and Ordering Semantics for Image Captioning]
📰解读
- ❌[Comprehending and Ordering Semantics for Image Captioning]
-
图像去雾
- ❌[Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition]
📰解读
- ❌[Self-augmented Unpaired Image Dehazing via Density and Depth Decomposition]
-
图像到图像翻译
- ❌[Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint]
📰解读
- ❌[Alleviating Semantics Distortion in Unsupervised Low-Level Image-to-Image Translation via Structure Consistency Constraint]
-
光流
-
图像生成
- ❌[Modeling Image Composition for Complex Scene Generation]
📰解读
- ❌[Modeling Image Composition for Complex Scene Generation]
-
连续学习
- ❌[Continual Learning with Lifelong Vision Transformer]
📰解读
- ❌[Continual Learning with Lifelong Vision Transformer]
-
元学习
- ❌[Learning to Learn and Remember Super Long Multi-Domain Task Sequence]
📰解读
- ❌[Learning to Learn and Remember Super Long Multi-Domain Task Sequence]
-
目标检测
-
HOI
- ❌[Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection]
📰解读
- ❌[Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection]
-
视频建模
- ❌[Stand-Alone Inter-Frame Attention in Video Models]
📰解读
- ❌[Stand-Alone Inter-Frame Attention in Video Models]
-
其他
-
视频场景分割
- ❌[Scene Consistency Representation Learning for Video Scene Segmentation]
📰解读
- ❌[Scene Consistency Representation Learning for Video Scene Segmentation]
-
图像字幕
- ❌[DIFNet: Boosting Visual Information Flow for Image Captioning]
📰解读
- ❌[DIFNet: Boosting Visual Information Flow for Image Captioning]
-
姿态
- ❌[Location-Free Human Pose Estimation]
📰解读
- ❌[Location-Free Human Pose Estimation]
-
小样本
-
点云
-
人脸
-
目标检测
- ❌[Thinking Camouflaged Object Detection in Frequency]
📰解读
- ❌[Thinking Camouflaged Object Detection in Frequency]
-
对抗
- ❌[Efficent Data-free Model Stealing for Black-box Adversarial Attacks]
📰解读
- ❌[Efficent Data-free Model Stealing for Black-box Adversarial Attacks]
-
分割
-
3D场景
- ❌[Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes]
📰粗解
- ❌[Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes]
-
行人轨迹预测
- ❌[Human Trajectory Prediction with Momentary Observation]
📰粗解
- ❌[Human Trajectory Prediction with Momentary Observation]
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation
来源
[Two Systems in Thinking: Dual-System Transformer for Grounded Situation Recognition]
[Autoregressive Image Generation using Residual Quantization]
✔️Instance-wise Occlusion and Depth Orders in Natural Scenes
[Style Neophile: Constantly Seeking Novel Styles for Domain Generalization]
[ReSTR: Convolution-free Referring Image Segmentation Using Transformers]
[FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation]
[TransforMatcher: Match-to-Match Attention for Semantic Correspondence]
[Reflection and Rotation Symmetry Detection via Equivariant Learning]
[Semi-supervised Semantic Segmentation with Error Localization Network]
[Future Transformer for Long-term Action Anticipation]
[Self-Taught Metric Learning without Labels]
✔️Fast Point Transformer
[Integrative Few-Shot Learning for Classification and Segmentation]
[Scene Painting via Semantic Image Synthesis]
[Detector-Free Weakly Supervised Group Activity Recognition]