INDEX
Explanations
the word "Tell" in sentences as cues for actions or requests
New Auto-Interp
Negative Logits
imposed
-0.71
zinski
-0.65
urdue
-0.64
cells
-0.61
ccording
-0.60
berus
-0.60
ILCS
-0.59
isk
-0.59
adesh
-0.59
sidx
-0.59
POSITIVE LOGITS
tale
1.55
ingly
1.05
us
1.02
tale
0.90
tales
0.89
me
0.86
stories
0.81
Tale
0.80
lies
0.78
tell
0.77
Activations Density 0.049%