INDEX
Explanations
expressions of negation related to personal experiences or beliefs
New Auto-Interp
Negative Logits
å§ĭ
-0.19
already
-0.19
Already
-0.18
already
-0.17
bereits
-0.17
Already
-0.16
ingen
-0.16
hani
-0.16
">//
-0.16
à¸
-0.15
POSITIVE LOGITS
theless
0.38
-ending
0.35
quite
0.31
again
0.31
-before
0.28
really
0.28
ending
0.26
once
0.26
mind
0.26
-ever
0.25
Activations Density 0.061%