My personal favorite from ICLR 2020. The paper shows on which conditions GNN can compute any function and that the product of depth*width of GNN should be of size ~n in order to compute popular statistics on graphs (e.g. diameter, vertex cover, coloring, etc.).
The paper kind of shows that graphs are not necessary for graph classification. If you represent a graph as just a set of nodes without any information on their adjacency and train MLP model, you can get SOTA results. Important lesson to learn when we make judgments about the quality of the idea/paper based on empirical results.