INDEX
Explanations
expressions of disagreement and argumentation
claims and statements regarding political or social issues
New Auto-Interp
Negative Logits
OTUS
-0.78
Bonus
-0.71
Himself
-0.69
wn
-0.66
ãĤ¼ãĤ¦ãĤ¹
-0.65
odiac
-0.65
avis
-0.64
otion
-0.64
alt
-0.63
hyde
-0.63
POSITIVE LOGITS
unfair
1.08
unfairly
0.95
undue
0.77
inadequate
0.76
misrepresent
0.75
misleading
0.75
discriminatory
0.75
loopholes
0.74
threatened
0.71
unjust
0.71
Activations Density 0.223%