INDEX
Explanations
proper nouns, specifically likely names or titles ending in 'her'
references to a specific female individual or pronouns associated with her
New Auto-Interp
Negative Logits
reb
-0.67
Glou
-0.65
tie
-0.62
trim
-0.61
sur
-0.61
Rockefeller
-0.60
rolling
-0.60
ISO
-0.59
hypothetical
-0.56
RAW
-0.56
POSITIVE LOGITS
itage
1.44
ald
1.05
theless
1.03
itance
0.99
ding
0.94
metic
0.93
rors
0.88
loo
0.86
rha
0.86
metics
0.85
Activations Density 0.010%