INDEX
Explanations
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
tml
-0.70
ribes
-0.64
mble
-0.63
aughed
-0.61
aughtered
-0.61
ometimes
-0.60
semble
-0.58
[|
-0.57
sed
-0.55
mits
-0.55
POSITIVE LOGITS
's
1.19
joining
1.10
being
1.09
behaving
1.08
becoming
1.06
quitting
1.02
stealing
1.02
disappearing
1.00
marrying
1.00
raping
1.00
Activations Density 0.344%