INDEX
Explanations
expressions of acknowledgment or confession
New Auto-Interp
Negative Logits
vite
-0.16
ibar
-0.15
еж
-0.15
Ñĩи
-0.15
omm
-0.14
issing
-0.14
/player
-0.14
347
-0.14
uos
-0.14
mares
-0.14
POSITIVE LOGITS
defeat
0.23
to
0.18
having
0.18
truth
0.17
receipt
0.16
comp
0.15
responsibility
0.15
parts
0.15
mas
0.15
/conf
0.15
Activations Density 0.025%