INDEX
Explanations
proper names, specifically the name "Nate"
the presence of the name "Nate"
New Auto-Interp
Negative Logits
inis
-0.70
omething
-0.69
Carbuncle
-0.67
plate
-0.64
raved
-0.64
ultras
-0.62
pans
-0.62
colour
-0.61
ature
-0.60
aries
-0.60
POSITIVE LOGITS
elsen
1.11
Cohn
0.79
Nielsen
0.77
rics
0.77
Nate
0.76
rice
0.74
elson
0.73
rique
0.71
Ort
0.71
til
0.69
Activations Density 0.034%