INDEX
Explanations
occurrences of the word "my" and its variations in the text
New Auto-Interp
Negative Logits
s
-0.20
rance
-0.15
sburg
-0.15
elez
-0.15
yes
-0.15
DEX
-0.15
ptive
-0.14
my
-0.14
yard
-0.14
stin
-0.14
POSITIVE LOGITS
self
0.24
cool
0.22
SELF
0.21
embro
0.21
own
0.20
Cool
0.19
rtle
0.19
Cool
0.19
anmar
0.19
Own
0.18
Activations Density 0.013%