INDEX
Explanations
personal pronouns referring to oneself
references to personal experience and identity
New Auto-Interp
Negative Logits
ibaba
-0.96
pload
-0.72
ornia
-0.68
etheus
-0.67
ulton
-0.67
atlantic
-0.65
yson
-0.64
ģĸ
-0.63
otine
-0.63
odge
-0.62
POSITIVE LOGITS
selves
0.87
dearly
0.85
personally
0.79
atically
0.78
terday
0.78
atic
0.76
uncond
0.76
self
0.75
Redditor
0.72
atics
0.70
Activations Density 0.162%