INDEX
Explanations
punctuation marks or specific symbols in the text
New Auto-Interp
Negative Logits
wich
-0.14
ocity
-0.14
besides
-0.13
ensibly
-0.13
Various
-0.13
regardless
-0.13
lee
-0.13
Both
-0.13
which
-0.13
Nice
-0.13
POSITIVE LOGITS
yet
0.31
yet
0.21
albeit
0.21
Yet
0.20
Yet
0.20
often
0.15
wenn
0.15
nay
0.15
undi
0.15
anza
0.15
Activations Density 0.074%