INDEX
Explanations
references to violence and conflict
New Auto-Interp
Negative Logits
è´¨
-0.16
ils
-0.15
éĹ
-0.15
ográf
-0.15
modo
-0.14
onn
-0.14
uge
-0.14
ODE
-0.14
church
-0.14
ÑĪев
-0.14
POSITIVE LOGITS
by
0.29
bợi
0.20
oleh
0.20
_by
0.17
ANI
0.15
by
0.15
/lic
0.14
تÙĪØ³Ø·
0.14
pelos
0.14
nak
0.14
Activations Density 0.177%