INDEX
Explanations
names, specifically last names
the presence of the name "Aren" in various contexts
New Auto-Interp
Negative Logits
srfAttach
-0.69
Accuracy
-0.65
jerk
-0.61
Staten
-0.57
extent
-0.57
achusetts
-0.57
modesty
-0.57
err
-0.56
MT
-0.54
Hole
-0.53
POSITIVE LOGITS
emies
0.89
ski
0.86
swer
0.83
thal
0.80
thro
0.79
stant
0.78
ites
0.78
ko
0.77
heit
0.76
aii
0.75
Activations Density 0.007%