Ex. molecular graph of citronella:
Inaccurate but illustrative solution (without message passing):
In general, MPNN will replace $D$ and $C$ (often recently also $A$) with a learned operation parameterized by neural networks.
Objective is to let information flow between nodes.
Explanation
alpha AA
beta AA
One way to do sequence-to-sequence prediction is do it autoregressively:
$p(\textbf{s}|\textbf{x}) = \prod_i\; p(s_i|\textbf{x}, \textbf{s}_{ < i})$In the paper, they chose to do it randomly by permuting the elements of a diagonal matrix:
In the paper, the gain in test set performance is pretty small (47.3 to 47.9 sequence recovery), so it's hard to know how well this worked.
Also, there was no hyperparameter tuning or performance variation quantification!