INDEX
Explanations
references to support or advocacy for specific political positions or groups
New Auto-Interp
Negative Logits
lah
-0.16
lm
-0.15
425
-0.15
пÑĢим
-0.14
(éĩij
-0.14
scape
-0.14
105
-0.14
flux
-0.14
proof
-0.13
'&#
-0.13
POSITIVE LOGITS
ouble
0.15
hlen
0.15
ovny
0.15
onec
0.14
uture
0.13
oun
0.13
632
0.13
Humanities
0.13
Polo
0.13
InChildren
0.13
Activations Density 0.014%