INDEX
Explanations
the word "have" or its variations
the phrase "I have" in various contexts
New Auto-Interp
Negative Logits
catentry
-0.72
oshi
-0.71
icking
-0.64
weed
-0.59
patch
-0.57
TG
-0.57
Apart
-0.57
bies
-0.56
wait
-0.55
Byz
-0.55
POSITIVE LOGITS
been
1.28
been
1.16
Been
0.95
gotten
0.89
undergone
0.86
become
0.85
seen
0.84
survived
0.81
done
0.80
no
0.79
Activations Density 0.272%