INDEX
Explanations
terms related to politics and policy, with a focus on social justice, mental health, and government actions
New Auto-Interp
Negative Logits
ãĤ¨ãĥ«
-0.96
éŃĶ
-0.83
SN
-0.81
ãĥĥ
-0.81
stown
-0.79
ãĥĥãĥī
-0.77
GMT
-0.75
ãĥ¥
-0.74
odox
-0.74
gro
-0.74
POSITIVE LOGITS
superpower
0.83
eers
0.76
equivalents
0.72
nonprofits
0.69
pamph
0.68
dystop
0.68
issues
0.68
optimization
0.68
guru
0.67
satire
0.67
Activations Density 2.078%