INDEX
Explanations
names with specific initials followed by a single-digit activation value
proper names, specifically initials and surnames of individuals
New Auto-Interp
Negative Logits
MSG
-0.81
neon
-0.69
ModLoader
-0.67
Women
-0.67
TMZ
-0.67
TVs
-0.66
charity
-0.66
knit
-0.65
governing
-0.65
prime
-0.64
POSITIVE LOGITS
linger
1.10
isson
1.08
essler
1.07
antz
0.99
isner
0.98
iggins
0.98
utsch
0.97
ullivan
0.96
ould
0.96
orman
0.95
Activations Density 0.146%