INDEX
Explanations
names of people
proper nouns, particularly names of people and family connections
New Auto-Interp
Negative Logits
tracking
-0.84
CONCLUS
-0.81
scanners
-0.72
pmwiki
-0.69
Platform
-0.66
incent
-0.65
LEVEL
-0.65
weapon
-0.64
subreddit
-0.62
dystop
-0.60
POSITIVE LOGITS
mie
1.06
Jr
1.00
Sr
0.98
Rodham
0.95
ilde
0.91
nie
0.90
hyde
0.90
Doe
0.89
lynn
0.89
abeth
0.88
Activations Density 0.170%