INDEX
Negative Logits
'
-3.47
was
-3.17
ar
-2.88
d
-2.88
has
-2.81
*
-2.56
"
-2.53
You
-2.42
a
-2.38
)
-2.34
POSITIVE LOGITS
diki
2.67
镚
2.61
kollu
2.56
Ꮉ
2.56
selben
2.50
憮
2.50
清新
2.42
kopling
2.39
秕
2.39
triko
2.36
Activations Density 0.007%
'
was
ar
d
has
*
"
You
a
)
diki
镚
kollu
Ꮉ
selben
憮
清新
kopling
秕
triko