INDEX
Explanations
proper nouns and names of places or organizations
New Auto-Interp
Negative Logits
BAD
-0.15
оÑĢо
-0.15
tub
-0.14
ä¼ı
-0.14
BAD
-0.14
_BACKEND
-0.13
606
-0.13
æ¡
-0.13
岡
-0.13
zo
-0.13
POSITIVE LOGITS
Ab
0.23
querque
0.20
-ab
0.18
ab
0.18
аб
0.17
.Ab
0.16
enant
0.16
AB
0.16
olutely
0.16
áb
0.15
Activations Density 0.034%