INDEX
Explanations
instances of specific punctuation and formatting related to names and places
New Auto-Interp
Negative Logits
даÑĤ
-0.16
auss
-0.15
олов
-0.15
ivre
-0.14
relation
-0.14
Relation
-0.14
Relation
-0.14
ront
-0.14
Transition
-0.14
hte
-0.13
POSITIVE LOGITS
alian
0.20
atz
0.19
ús
0.18
olor
0.18
ún
0.18
illa
0.17
ú
0.17
oce
0.16
.XR
0.16
aving
0.16
Activations Density 0.002%