INDEX
Explanations
the word "Our" in various contexts, indicating a focus on possessive language related to community or belonging
New Auto-Interp
Negative Logits
heads
-0.07
ially
-0.06
ams
-0.06
arken
-0.06
Lawson
-0.06
onic
-0.06
ctors
-0.06
ductive
-0.06
oc
-0.06
ardy
-0.06
POSITIVE LOGITS
maz
0.09
agini
0.08
Own
0.07
vod
0.07
imary
0.07
krom
0.07
êu
0.07
tesy
0.07
RIPT
0.07
Vue
0.07
Activations Density 0.016%