INDEX
Explanations
mentions of a specific name, particularly "Davis."
New Auto-Interp
Negative Logits
irty
-0.16
ariat
-0.15
å¹ķ
-0.15
orses
-0.15
aptops
-0.14
Ø·Ùģ
-0.14
Brun
-0.14
weis
-0.14
holds
-0.14
Kob
-0.14
POSITIVE LOGITS
son
0.20
sono
0.15
quared
0.15
yonel
0.15
burg
0.15
weed
0.14
FTA
0.14
emem
0.14
oidal
0.14
Ñģон
0.14
Activations Density 0.010%