convoluted connor @notwa@witches.town
Suivre

i believe i've implemented the optimizer described in: arxiv.org/abs/1712.03298
it seems to have comparable performance to Nesterov momentum with gradient clipping, which is my usual go-to when Adam doesn't work.