INDEX
Explanations
discourse markers that indicate contrast or transitions in arguments
New Auto-Interp
Negative Logits
rant
-0.16
-in
-0.15
oons
-0.15
strstr
-0.14
thur
-0.14
wayne
-0.14
warts
-0.13
ÙĪÙĨÙĩ
-0.13
ottes
-0.13
.addRow
-0.13
POSITIVE LOGITS
ÙĪÙģÙĬ
0.25
During
0.22
during
0.21
In
0.20
During
0.19
At
0.19
On
0.18
ï¼Įåľ¨
0.18
åľ¨åľ°
0.18
during
0.17
Activations Density 0.254%