INDEX
Explanations
proper names
the name "Dean" across different contexts
New Auto-Interp
Negative Logits
ramid
-0.78
atoon
-0.78
TOR
-0.77
enegger
-0.75
Flavoring
-0.75
roxy
-0.71
olitan
-0.71
psey
-0.70
iculty
-0.69
ickr
-0.67
POSITIVE LOGITS
Winchester
1.06
Ambrose
0.80
uates
0.75
uate
0.75
plan
0.72
Foster
0.72
ysis
0.70
Beam
0.70
uation
0.69
ette
0.69
Activations Density 0.019%