INDEX
Negative Logits
yip
-0.68
lid
-0.63
erver
-0.57
faint
-0.56
PLIED
-0.55
rusty
-0.55
blat
-0.54
imeters
-0.54
duration
-0.54
diffuse
-0.54
POSITIVE LOGITS
neys
0.90
dan
0.79
Whedon
0.77
ernaut
0.76
eki
0.76
Marriott
0.74
iard
0.74
iffe
0.73
ilee
0.72
isco
0.72
Activations Density 0.063%