INDEX
Explanations
phrases indicating future actions or events
the auxiliary verb "have" in various contexts
New Auto-Interp
Negative Logits
rumor
-0.62
osi
-0.62
guiName
-0.59
wounding
-0.59
andan
-0.58
bribe
-0.58
trope
-0.57
ensing
-0.57
mash
-0.57
disguise
-0.55
POSITIVE LOGITS
been
1.02
gotten
0.97
taken
0.84
seen
0.84
undergone
0.83
eaten
0.83
gone
0.82
drawn
0.81
been
0.79
gotten
0.79
Activations Density 0.071%