INDEX
Explanations
contradictions or nuances in statements
New Auto-Interp
Negative Logits
but
-0.22
maar
-0.17
but
-0.17
αλλά
-0.16
sice
-0.16
mais
-0.16
ãģłãģĮ
-0.15
edb
-0.15
ãģ§ãģĻãģĮ
-0.15
btw
-0.15
POSITIVE LOGITS
nevertheless
0.74
nonetheless
0.73
Nevertheless
0.60
Nonetheless
0.54
Nevertheless
0.53
anyway
0.44
toch
0.42
Anyway
0.40
Anyway
0.38
è¿ĺæĺ¯
0.37
Activations Density 0.564%