INDEX
Explanations
references to attention and detail
New Auto-Interp
Negative Logits
ped
-0.19
oce
-0.16
odont
-0.16
oca
-0.16
ping
-0.15
aven
-0.15
hest
-0.15
brook
-0.15
uden
-0.15
abin
-0.14
POSITIVE LOGITS
paid
0.27
span
0.23
al
0.23
Paid
0.22
spans
0.21
Paid
0.20
åĬĽ
0.19
paid
0.19
grabbing
0.19
-paid
0.18
Activations Density 0.017%