INDEX
Explanations
phrases indicating future action or plans
occurrences of the word "have" in various contexts
New Auto-Interp
Negative Logits
guiName
-0.61
osi
-0.61
disguise
-0.60
hostage
-0.58
rumor
-0.57
wounding
-0.55
andan
-0.55
arming
-0.54
mash
-0.54
currently
-0.53
POSITIVE LOGITS
been
1.03
gotten
0.98
been
0.94
gone
0.91
undergone
0.87
stood
0.85
taken
0.85
eaten
0.84
begun
0.83
gotten
0.81
Activations Density 0.060%