INDEX
Explanations
names of political figures
parentheses in the text
New Auto-Interp
Negative Logits
sonic
-0.75
retard
-0.74
irds
-0.74
lull
-0.74
cules
-0.73
utic
-0.73
spir
-0.71
sav
-0.68
psychosis
-0.68
interstellar
-0.68
POSITIVE LOGITS
pictured
1.28
formerly
1.23
sic
1.15
among
1.14
who
1.11
via
1.09
Bloomberg
1.07
whose
1.05
both
1.03
see
1.01
Activations Density 0.076%