INDEX
Explanations
discussions surrounding social and political beliefs
New Auto-Interp
Negative Logits
indr
-0.16
æľĭ
-0.15
favored
-0.15
дол
-0.15
anymore
-0.15
barely
-0.15
favors
-0.14
theater
-0.14
referencing
-0.14
Regardless
-0.14
POSITIVE LOGITS
nexus
0.16
æ
0.15
èIJ
0.15
ague
0.14
Tribal
0.13
semi
0.13
passionately
0.13
egot
0.13
-Clause
0.13
suburban
0.13
Activations Density 0.014%