INDEX
Explanations
terms associated with political and societal critique
New Auto-Interp
Negative Logits
esar
-0.16
_tensors
-0.16
845
-0.16
uten
-0.15
773
-0.15
umont
-0.15
ESC
-0.14
æ°ij主
-0.14
progressive
-0.14
oren
-0.14
POSITIVE LOGITS
Jude
0.26
liberty
0.25
conservative
0.23
Conserv
0.23
conservatism
0.23
conservatives
0.21
America
0.20
Constitutional
0.20
freedom
0.20
Traditional
0.19
Activations Density 0.206%