INDEX
Explanations
actions taken by individuals
New Auto-Interp
Negative Logits
CHED
-0.14
нÑıÑĤ
-0.14
Scalars
-0.14
Permanent
-0.14
itus
-0.14
omer
-0.13
Pom
-0.13
totiž
-0.13
au
-0.13
ulin
-0.13
POSITIVE LOGITS
strup
0.18
pant
0.17
isque
0.15
ippers
0.14
sorte
0.14
apg
0.14
urette
0.14
uddy
0.14
yor
0.13
Impl
0.13
Activations Density 0.254%