INDEX
Explanations
quotations or statements made in political contexts
New Auto-Interp
Negative Logits
arest
-0.71
Tank
-0.67
ress
-0.66
lete
-0.65
Ü
-0.62
izont
-0.60
estern
-0.60
Pont
-0.60
STE
-0.59
andem
-0.59
POSITIVE LOGITS
although
1.14
"[
1.09
"...
0.84
"â̦
0.83
while
0.80
whilst
0.79
'[
0.79
soever
0.77
whereas
0.76
"#
0.76
Activations Density 0.195%