INDEX
Negative Logits
rouse
-0.88
arnaev
-0.68
afort
-0.68
igate
-0.67
row
-0.66
iveness
-0.65
ivating
-0.64
aver
-0.63
arching
-0.63
icipated
-0.63
POSITIVE LOGITS
terday
0.78
alas
0.78
enough
0.75
nown
0.74
misunderstand
0.72
sadly
0.72
unfortunately
0.70
,...
0.68
untrue
0.66
mistaken
0.66
Activations Density 0.031%