INDEX
Explanations
the repeated mention of a specific entity, likely a person's name or a brand
New Auto-Interp
Negative Logits
lej
-0.17
urer
-0.17
pent
-0.17
uby
-0.16
hti
-0.15
umas
-0.15
p
-0.15
atur
-0.15
udy
-0.14
upro
-0.14
POSITIVE LOGITS
ro
0.20
Ro
0.19
oster
0.18
-ro
0.18
Ro
0.18
(ro
0.17
aN
0.17
yalty
0.17
asio
0.17
Âłro
0.16
Activations Density 0.013%