INDEX
Explanations
references to specific individuals involved in reporting or activism
New Auto-Interp
Negative Logits
ìķ½
-0.15
rox
-0.14
ów
-0.14
ój
-0.14
atak
-0.13
ÑĤÑĮ
-0.13
że
-0.13
è¡Ĩ
-0.13
isel
-0.13
.Names
-0.13
POSITIVE LOGITS
Inner
0.32
Inner
0.28
.Inner
0.25
inner
0.24
inner
0.23
-inner
0.23
Outer
0.21
INNER
0.20
.inner
0.20
(inner
0.20
Activations Density 0.000%