INDEX
Explanations
mentions of the name "Winfrey."
New Auto-Interp
Negative Logits
ea
-0.19
een
-0.18
duct
-0.18
eed
-0.18
ee
-0.18
eer
-0.17
aler
-0.16
ed
-0.16
venge
-0.16
acks
-0.15
POSITIVE LOGITS
-win
0.26
throp
0.24
ning
0.24
/win
0.23
ona
0.23
ners
0.23
eries
0.22
try
0.21
ograd
0.20
nable
0.20
Activations Density 0.018%