INDEX
Explanations
terms related to social actions and political conditions
New Auto-Interp
Negative Logits
Handling
-0.16
Selling
-0.16
нанеÑģ
-0.15
yoktur
-0.15
Handling
-0.15
æ¡ij
-0.14
(Encoding
-0.14
èm
-0.14
kunt
-0.14
Killing
-0.13
POSITIVE LOGITS
being
1.00
being
0.80
Being
0.71
Being
0.66
becoming
0.58
sendo
0.56
被
0.55
siendo
0.52
essere
0.50
-being
0.49
Activations Density 0.074%