Unlocking Slot Attention by Changing Optimal Transport Costs

conference paper
Slot attention is a powerful method for object-centric modeling in images and videos. However, its set-equivariance limits its ability to handle videos with a dynamic number of objects because it cannot break ties. To overcome this limitation, we first establish a connection between slot attention and optimal transport. Based on this new perspective we propose MESH (Minimize Entropy of Sinkhorn): a cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. We evaluate slot attention using MESH on multiple object-centric learning benchmarks and find significant improvements over slot attention in every setting.
TNO Identifier
997149
Source title
NeurIPS workshop on Neuro Causal and Symbolic AI (nCSI), and, NeurIPS workshop on Attention, 2023
Files
To receive the publication files, please send an e-mail request to TNO Repository.