INDEX
Explanations
negations and their associated contexts in expressions of opinion and reasoning
New Auto-Interp
Negative Logits
apus
-0.15
åĩ¡
-0.14
ASF
-0.14
voj
-0.13
ountains
-0.13
Wikimedia
-0.13
)(((
-0.13
zza
-0.13
)((((
-0.13
umno
-0.12
POSITIVE LOGITS
either
1.72
either
1.46
Either
1.44
EITHER
1.36
Either
1.34
либо
0.85
soit
0.73
ither
0.71
ITHER
0.62
εί
0.56
Activations Density 0.523%