INDEX
Explanations
phrases related to political ideologies
New Auto-Interp
Negative Logits
Amen
-0.81
CJ
-0.65
adjoining
-0.65
itol
-0.65
Whe
-0.63
catentry
-0.62
Eleven
-0.60
Emir
-0.60
"}],"
-0.60
north
-0.59
POSITIVE LOGITS
've
1.05
'd
0.93
're
0.92
'll
0.92
uristic
0.81
didn
0.80
pherd
0.79
miah
0.73
literally
0.72
self
0.71
Activations Density 0.295%