INDEX
Explanations
instances where someone is being directly addressed or summoned
instances of the word "called."
New Auto-Interp
Negative Logits
olitics
-0.82
edia
-0.82
iland
-0.77
bilt
-0.76
feat
-0.76
enture
-0.71
yip
-0.70
edom
-0.66
ebin
-0.65
isphere
-0.64
POSITIVE LOGITS
upon
0.96
forth
0.83
Attention
0.72
attention
0.71
into
0.70
bluff
0.67
responsible
0.66
oused
0.66
ingen
0.64
up
0.63
Activations Density 0.054%