INDEX
Explanations
occurrences of the name "Andy" with varying activations
occurrences of the name "Andy."
New Auto-Interp
Negative Logits
hips
-0.95
ingen
-0.75
maid
-0.73
prise
-0.71
llah
-0.69
ledged
-0.68
finding
-0.67
daq
-0.67
inals
-0.66
iotic
-0.66
POSITIVE LOGITS
Dalton
0.91
Coul
0.82
Weir
0.80
Griffith
0.79
Lyons
0.78
Kaufman
0.78
Reid
0.78
Carroll
0.76
Cohen
0.75
Burn
0.74
Activations Density 0.015%