INDEX
Explanations
words that refer to various roles or occupations
New Auto-Interp
Negative Logits
usz
-0.19
eur
-0.17
ram
-0.17
aland
-0.17
son
-0.16
nier
-0.16
iveness
-0.16
392
-0.15
-gnu
-0.15
sv
-0.15
POSITIVE LOGITS
-upper
0.34
-than
0.27
hip
0.23
who
0.23
er
0.22
outes
0.21
idge
0.20
/loader
0.20
/renderer
0.20
/compiler
0.19
Activations Density 0.265%