INDEX
Explanations
phrases related to ideological viewpoints
occurrences of the word "ide" and its variations in different contexts
New Auto-Interp
Negative Logits
enegger
-0.88
lished
-0.77
iona
-0.74
ishable
-0.72
ions
-0.71
lishing
-0.70
ured
-0.68
ION
-0.65
thur
-0.64
ufact
-0.63
POSITIVE LOGITS
gger
0.96
ll
0.91
creen
0.85
llo
0.82
OLOG
0.79
vice
0.79
lli
0.77
llan
0.77
lla
0.77
rer
0.75
Activations Density 0.046%