INDEX
Explanations
occurrences of the word "have" followed by another word
New Auto-Interp
Negative Logits
benefit
-0.74
icking
-0.72
Interested
-0.70
matter
-0.69
—-
-0.66
rift
-0.65
reality
-0.65
die
-0.64
eem
-0.64
iling
-0.64
POSITIVE LOGITS
experimented
1.15
been
1.09
opted
1.05
resorted
1.04
gotten
1.02
begun
1.02
mastered
1.00
chosen
0.96
undergone
0.94
devised
0.94
Activations Density 0.268%