INDEX
Explanations
instances of dialogue and quotes that express contrasting opinions or provide commentary
New Auto-Interp
Negative Logits
emens
-0.15
eldom
-0.14
ocale
-0.14
buflen
-0.14
ç¦
-0.14
proverb
-0.14
sake
-0.14
_PATCH
-0.14
igans
-0.14
orc
-0.14
POSITIVE LOGITS
ÙĪØ£ÙĨ
0.31
rằng
0.26
that
0.23
bahwa
0.21
that
0.20
ÏĮÏĦι
0.18
aggi
0.16
that
0.16
že
0.15
että
0.15
Activations Density 0.306%