INDEX
Explanations
references to individuals, groups, or entities involved in social dynamics and interactions
New Auto-Interp
Negative Logits
Ñĥнк
-0.16
Killing
-0.15
annis
-0.14
ãĥ¼ãĥª
-0.14
818
-0.14
lient
-0.14
rio
-0.14
QE
-0.14
ould
-0.14
ÙĦع
-0.14
POSITIVE LOGITS
being
0.35
being
0.28
Being
0.23
Being
0.22
having
0.21
becoming
0.21
sendo
0.20
被
0.18
essere
0.17
’s
0.16
Activations Density 0.333%