INDEX
Explanations
statements about controversial issues or beliefs and statistics
New Auto-Interp
Negative Logits
enegger
-0.82
Samar
-0.80
nesday
-0.70
conclud
-0.69
berman
-0.69
Sapphire
-0.68
Downs
-0.67
drown
-0.66
drowning
-0.65
scattering
-0.65
POSITIVE LOGITS
¹
1.19
º
1.14
Į
1.11
»
1.11
®
1.10
į
1.09
£
1.07
¬
1.07
¡
1.06
Ī
1.04
Activations Density 0.104%