INDEX
Explanations
the name "Tyrone" mentioned in the text
the word "one"
New Auto-Interp
Negative Logits
lished
-0.88
actionGroup
-0.80
yrinth
-0.79
rador
-0.79
iosity
-0.77
achusetts
-0.76
awaru
-0.75
ruary
-0.74
lishes
-0.74
rawler
-0.72
POSITIVE LOGITS
gger
0.97
lihood
0.89
xus
0.85
Tone
0.83
llo
0.82
xit
0.81
Hundred
0.77
lli
0.77
horn
0.76
Bucc
0.74
Activations Density 0.031%