INDEX
Negative Logits
disrupt
-0.09
timers
-0.07
cardiovas
-0.07
completion
-0.07
responses
-0.07
выгляд
-0.07
disrupted
-0.07
effects
-0.07
spacing
-0.07
servent
-0.07
POSITIVE LOGITS
그
0.09
southern
0.08
لىق
0.08
沿
0.08
バッグ
0.08
rant
0.08
IFF
0.08
یی
0.08
northern
0.08
없는
0.08
Activations Density 0.003%