INDEX
Explanations
names of individuals
names of people in various contexts
New Auto-Interp
Negative Logits
į
-0.83
berra
-0.70
mble
-0.65
Americ
-0.64
ãĥŁ
-0.61
è¡
-0.61
Ĥ¬
-0.61
fecture
-0.60
ĭ
-0.60
κ
-0.59
POSITIVE LOGITS
himself
0.88
's
0.86
herself
0.84
testified
0.77
enegger
0.76
wore
0.70
underwent
0.70
was
0.70
began
0.68
admitted
0.67
Activations Density 0.229%