INDEX
Negative Logits
WR
-0.08
caution
-0.08
paw
-0.07
ौर
-0.07
THEM
-0.07
스타일
-0.07
rö
-0.07
.WR
-0.07
Piet
-0.07
slight
-0.07
POSITIVE LOGITS
eliminates
0.12
khỏi
0.11
elimina
0.11
избав
0.11
eliminar
0.11
eliminating
0.11
erad
0.11
eliminate
0.11
elimin
0.11
দূ
0.10
Activations Density 0.165%