INDEX
Explanations
adverbs ending in ly followed by verbs
New Auto-Interp
Negative Logits
at
1.18
to
0.96
ので
0.80
おか
0.76
ﺍ
0.75
お
0.71
،
0.71
that
0.70
or
0.69
مانی
0.69
POSITIVE LOGITS
-
1.07
ي
1.02
í
0.93
us
0.91
i
0.89
<0x80>
0.79
(
0.79
<0xB2>
0.73
ت
0.73
á
0.70
Activations Density 0.904%