INDEX
Explanations
names or terms related to historical figures
proper nouns, especially names and cultural references
New Auto-Interp
Negative Logits
fml
-0.77
holders
-0.68
userc
-0.68
drivers
-0.68
apeshifter
-0.68
ept
-0.67
creen
-0.66
cheat
-0.65
people
-0.65
yout
-0.65
POSITIVE LOGITS
sson
1.50
Berger
1.26
Garc
1.17
Romero
1.16
von
1.15
Sch
1.10
Ortiz
1.10
Jacobs
1.07
Herrera
1.06
Laur
1.06
Activations Density 0.261%