INDEX
Explanations
future actions or decisions
instances of the word "do" and its variations, indicating inquiries or commands about actions
New Auto-Interp
Negative Logits
Entered
-0.98
Frie
-0.72
gart
-0.70
theless
-0.67
sent
-0.65
tro
-0.63
Ĭ±
-0.63
inently
-0.62
printed
-0.61
ware
-0.61
POSITIVE LOGITS
pez
1.09
atives
0.74
berman
0.70
etting
0.68
wrong
0.66
ggy
0.65
INGS
0.65
differently
0.65
":"
0.64
ients
0.63
Activations Density 0.063%