INDEX
Explanations
mentions of the public figure Winfrey, particularly in a negative context
New Auto-Interp
Negative Logits
ea
-0.16
yll
-0.16
een
-0.16
east
-0.16
INO
-0.15
оÑĦ
-0.15
ino
-0.15
Krish
-0.14
venge
-0.14
eron
-0.14
POSITIVE LOGITS
ipeg
0.24
ograd
0.19
-win
0.19
nable
0.18
row
0.18
throp
0.17
nesota
0.17
now
0.16
ERRU
0.16
.UltraWin
0.16
Activations Density 0.025%