INDEX
Negative Logits
bitir
-0.06
elijke
-0.06
ики
-0.06
ічна
-0.06
ż
-0.06
basics
-0.06
키
-0.06
ayne
-0.06
ابتد
-0.06
مك
-0.06
POSITIVE LOGITS
study
0.07
Officer
0.07
forums
0.07
PLUS
0.06
_init
0.06
Hitler
0.06
Researchers
0.06
_Window
0.06
Founded
0.06
slam
0.06
Activations Density 0.032%