INDEX
Explanations
proper names, specifically the name "Ralph"
the name "Ralph" and its variations in different contexts
New Auto-Interp
Negative Logits
kers
-0.81
cases
-0.80
mble
-0.75
gerald
-0.74
glers
-0.72
lled
-0.71
tenance
-0.70
stroke
-0.70
lift
-0.69
ansas
-0.68
POSITIVE LOGITS
onse
0.89
Lauren
0.85
andom
0.84
hea
0.82
acet
0.79
Ralph
0.78
Wald
0.77
acco
0.76
otti
0.76
enn
0.76
Activations Density 0.007%