INDEX
Explanations
instances of the word "have"
the repetition of the word "have."
New Auto-Interp
Negative Logits
oshi
-0.68
catentry
-0.66
Apart
-0.61
icking
-0.59
oji
-0.59
eem
-0.57
weed
-0.57
etting
-0.56
cus
-0.56
bribe
-0.55
POSITIVE LOGITS
been
1.28
been
1.03
Been
0.92
seen
0.92
gotten
0.91
begun
0.88
become
0.86
undergone
0.82
done
0.80
gone
0.79
Activations Density 0.258%