INDEX
Explanations
contrasting or opposing statements throughout the text
New Auto-Interp
Negative Logits
even
-0.21
even
-0.20
although
-0.20
EVEN
-0.18
zwar
-0.18
actually
-0.18
çĶļèĩ³
-0.18
además
-0.18
Even
-0.17
truly
-0.17
POSITIVE LOGITS
nevertheless
0.53
nonetheless
0.52
Nevertheless
0.45
Nonetheless
0.37
Nevertheless
0.36
åį´
0.26
è¿ĺæĺ¯
0.24
certainly
0.24
theless
0.23
yine
0.23
Activations Density 0.606%