INDEX
Explanations
instances of the word "call" and related phrases
New Auto-Interp
Negative Logits
365
-0.17
atk
-0.16
els
-0.15
imo
-0.15
ara
-0.15
ubo
-0.15
ihat
-0.14
ycl
-0.14
ye
-0.14
yt
-0.14
POSITIVE LOGITS
dib
0.32
attention
0.29
quits
0.26
upon
0.25
oused
0.25
igraphy
0.24
ously
0.23
Attention
0.23
forth
0.23
attention
0.22
Activations Density 0.052%