INDEX
Explanations
phrases indicating a sense of obligation or dependency
New Auto-Interp
Negative Logits
tü
-0.17
auen
-0.15
veis
-0.15
aiser
-0.14
zeich
-0.14
.mag
-0.14
auf
-0.13
anik
-0.13
ÅĽci
-0.13
createView
-0.13
POSITIVE LOGITS
us
0.23
them
0.20
many
0.18
us
0.17
usat
0.16
him
0.16
everyone
0.15
usan
0.14
Spar
0.14
those
0.14
Activations Density 0.149%