INDEX
Explanations
instances of the word "I," indicating a focus on personal perspectives or self-references
New Auto-Interp
Negative Logits
Sim
-0.18
Sim
-0.16
ansi
-0.16
ota
-0.15
forms
-0.15
858
-0.15
uy
-0.15
Men
-0.15
aken
-0.14
San
-0.14
POSITIVE LOGITS
gie
0.17
lic
0.17
bler
0.17
ilon
0.16
aul
0.15
_CN
0.15
opic
0.15
#
0.15
ãĥĭãĥĥãĤ¯
0.15
mai
0.14
Activations Density 0.018%