INDEX
Negative Logits
accounts
-0.07
CST
-0.07
Cait
-0.07
капіт
-0.07
numar
-0.06
cela
-0.06
персп
-0.06
ît
-0.06
teplot
-0.06
ač
-0.06
POSITIVE LOGITS
wrong
0.17
Wrong
0.12
WRONG
0.11
wrong
0.11
Wrong
0.10
Ways
0.08
wrongdoing
0.08
wrongly
0.08
right
0.08
What
0.07
Activations Density 0.011%