INDEX
Explanations
opinions, beliefs, and claims expressed by individuals
assertions and beliefs that may be misleading or false
New Auto-Interp
Negative Logits
rontal
-0.92
ktop
-0.91
ograp
-0.77
teasp
-0.74
yna
-0.73
arthed
-0.73
exting
-0.72
Lex
-0.72
srf
-0.70
Pg
-0.70
POSITIVE LOGITS
otherwise
1.12
causation
0.86
somehow
0.81
conservatives
0.78
liberals
0.78
ignorance
0.78
Muslims
0.78
Asians
0.77
vaccines
0.77
incompetence
0.77
Activations Density 0.327%