INDEX
Explanations
proper nouns, specifically names like "Pere" and "Maurice"
mentions of specific individuals' names
New Auto-Interp
Negative Logits
imation
-0.82
ophobia
-0.74
ulously
-0.73
imates
-0.68
opsy
-0.67
ulous
-0.66
ablishment
-0.66
shame
-0.65
uments
-0.65
aughter
-0.65
POSITIVE LOGITS
mallow
0.87
theless
0.87
Pere
0.87
tti
0.86
ãģ¦
0.85
gr
0.78
gone
0.77
ignty
0.75
past
0.74
ments
0.74
Activations Density 0.018%