INDEX
Explanations
punctuation marks and connective phrases in sentences
New Auto-Interp
Negative Logits
urger
-0.17
eldre
-0.16
unsch
-0.15
许
-0.14
I
-0.14
onec
-0.14
Fate
-0.14
(
-0.14
IRT
-0.14
Lance
-0.14
POSITIVE LOGITS
wor
0.31
da
0.25
ob
0.25
wom
0.24
sod
0.24
denn
0.23
was
0.21
indem
0.21
wo
0.20
doch
0.19
Activations Density 0.017%