INDEX
Explanations
phrases related to politics and citizenship
New Auto-Interp
Negative Logits
pher
-0.71
aceae
-0.67
mone
-0.66
affles
-0.66
ovan
-0.64
amus
-0.64
abbit
-0.64
Disciple
-0.64
idated
-0.64
apter
-0.63
POSITIVE LOGITS
erton
1.16
screen
1.02
blown
0.88
fled
0.87
throttle
0.82
heartedly
0.82
frontal
0.81
fledged
0.81
blown
0.81
complement
0.80
Activations Density 0.031%