INDEX
Explanations
the presence of specific numeric identifiers or codes
New Auto-Interp
Negative Logits
hausen
-0.21
hart
-0.19
hammer
-0.19
EXPR
-0.19
horse
-0.18
holm
-0.17
hop
-0.17
hin
-0.17
handling
-0.16
hb
-0.15
POSITIVE LOGITS
riangle
0.30
emporary
0.30
wo
0.29
emple
0.29
ra
0.27
exas
0.27
urn
0.27
emperature
0.27
ech
0.26
erm
0.26
Activations Density 0.017%