INDEX
Explanations
statements expressing disagreement
expressions of disagreement
New Auto-Interp
Negative Logits
amina
-0.80
GV
-0.71
Ãł
-0.67
Roads
-0.65
oufl
-0.63
mary
-0.63
maximum
-0.63
annis
-0.62
spring
-0.62
Roller
-0.61
POSITIVE LOGITS
disagree
0.85
edIn
0.83
vehemently
0.81
llah
0.78
ingly
0.77
rences
0.76
ially
0.75
ively
0.73
lihood
0.73
unanimously
0.72
Activations Density 0.027%