INDEX
Explanations
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
ouz
-0.15
oze
-0.14
urm
-0.14
odal
-0.14
_Flag
-0.14
idl
-0.13
lrt
-0.13
ót
-0.13
ода
-0.13
renched
-0.13
POSITIVE LOGITS
greg
0.16
eniable
0.16
Greg
0.15
Ernest
0.14
Greg
0.14
gili
0.14
Orig
0.13
onders
0.13
cul
0.13
Gregory
0.13
Activations Density 0.102%