INDEX
Negative Logits
constitu
-0.08
erdem
-0.07
Seating
-0.07
-0.07
winners
-0.07
السيا
-0.07
suspicious
-0.07
illumination
-0.07
제공
-0.07
winning
-0.07
POSITIVE LOGITS
drank
0.10
_drag
0.09
կանխ
0.09
nonlinear
0.09
propelled
0.09
governed
0.09
.proto
0.08
Fahrt
0.08
Dragged
0.08
հեռ
0.08
Activations Density 0.012%