INDEX
Explanations
the word "myself"
references to self-identity and personal experiences
New Auto-Interp
Negative Logits
ories
-0.79
ulton
-0.78
olid
-0.77
cemic
-0.71
heny
-0.70
grade
-0.70
orie
-0.67
ibaba
-0.65
Sierra
-0.62
*/(
-0.62
POSITIVE LOGITS
selves
1.02
myself
0.99
personally
0.94
self
0.92
tremend
0.90
enthusi
0.85
imei
0.80
ashamed
0.77
honoured
0.76
instinct
0.75
Activations Density 0.010%