INDEX
Explanations
references to attention and detail in various contexts
New Auto-Interp
Negative Logits
hest
-0.18
ped
-0.17
oce
-0.17
oca
-0.16
iales
-0.15
ãģ£
-0.15
odont
-0.15
ilers
-0.15
oner
-0.15
jour
-0.15
POSITIVE LOGITS
span
0.32
spans
0.30
paid
0.30
al
0.27
Span
0.26
span
0.26
Span
0.26
Paid
0.25
-span
0.25
Paid
0.23
Activations Density 0.016%