INDEX
Explanations
proper nouns, particularly names of people involved in various professions or activities
New Auto-Interp
Negative Logits
estro
-0.15
Sparks
-0.14
auc
-0.14
saliva
-0.14
ivas
-0.14
buc
-0.14
riad
-0.13
Mess
-0.13
summer
-0.13
operational
-0.13
POSITIVE LOGITS
ela
0.18
bourne
0.15
dar
0.15
ussen
0.14
blas
0.14
YRO
0.14
idge
0.14
dong
0.14
bet
0.14
zet
0.14
Activations Density 0.253%