INDEX
Explanations
the name "Nathan" at varying levels of activation
the name "Nathan."
New Auto-Interp
Negative Logits
essee
-0.82
tesy
-0.77
ative
-0.75
WATCHED
-0.75
tarians
-0.73
aci
-0.73
ating
-0.70
rera
-0.69
resents
-0.69
acies
-0.69
POSITIVE LOGITS
elson
0.99
Hale
0.92
Grayson
0.86
Bedford
0.81
Rothschild
0.79
Liver
0.77
Freed
0.77
Redmond
0.77
Prescott
0.73
Drake
0.72
Activations Density 0.040%