INDEX
Explanations
phrases expressing personal reflections and experiences
New Auto-Interp
Negative Logits
andex
-0.15
ayet
-0.15
isy
-0.14
ilestone
-0.14
wan
-0.14
egot
-0.14
Ĵ
-0.14
ield
-0.14
thouse
-0.14
avanaugh
-0.13
POSITIVE LOGITS
myself
0.54
mine
0.51
personally
0.48
Personally
0.42
Personally
0.40
mine
0.37
ours
0.35
Mine
0.34
my
0.34
saya
0.33
Activations Density 0.371%