INDEX
Explanations
words related to identification and classification
New Auto-Interp
Negative Logits
yal
-0.19
ED
-0.16
ted
-0.16
inals
-0.15
ND
-0.15
inet
-0.15
aed
-0.14
anske
-0.14
ged
-0.14
amet
-0.14
POSITIVE LOGITS
enen
0.24
ene
0.20
enes
0.20
en
0.19
ener
0.18
sehen
0.17
Guill
0.16
romo
0.15
genes
0.15
rene
0.15
Activations Density 0.015%