INDEX
Negative Logits
Oleg
-0.68
ql
-0.63
숙
-0.63
pebbles
-0.63
Andrei
-0.61
труд
-0.60
Adrien
-0.60
sant
-0.59
-0.59
člove
-0.59
POSITIVE LOGITS
Jus
1.08
Jus
1.07
Justification
0.97
Justification
0.96
]));
0.90
gius
0.89
jus
0.88
tifications
0.85
justifying
0.84
justify
0.82
Activations Density 0.006%