The models showcasing capabilities can be said to be grokking, a situation where models exhibit perfect test performance after extended training. However, the researchers said there are key ...