INDEX
Explanations
instances of the prefix "dis" indicating a state of negation or removal
New Auto-Interp
Negative Logits
venes
-0.17
edom
-0.16
mÃŃ
-0.16
onec
-0.15
Niet
-0.15
ins
-0.15
ahoma
-0.15
ices
-0.15
tir
-0.14
yles
-0.14
POSITIVE LOGITS
.dis
0.21
Dis
0.21
dis
0.20
Grace
0.20
(dis
0.20
-dis
0.18
/dis
0.18
washer
0.18
Grace
0.18
grace
0.18
Activations Density 0.031%