INDEX
Explanations
first names followed by last names
New Auto-Interp
Negative Logits
inverted
0.63
skewed
0.60
0.57
subtle
0.57
item
0.57
proverbial
0.57
zero
0.56
alpha
0.56
output
0.55
ac
0.55
POSITIVE LOGITS
<unused118>
0.87
chyné
0.83
bergabung
0.83
великолеп
0.83
<unused1005>
0.82
וא
0.82
compañía
0.81
쮿
0.80
<unused1091>
0.80
melaksanakan
0.80
Activations Density 0.060%