INDEX
Explanations
phrases or sentences indicating disagreement
instances of disagreement or dissenting opinions
New Auto-Interp
Negative Logits
amina
-0.79
GV
-0.74
oufl
-0.69
Roads
-0.69
spring
-0.68
Jackets
-0.65
maximum
-0.64
adrenaline
-0.62
Ãł
-0.62
uxe
-0.61
POSITIVE LOGITS
rences
0.88
ially
0.85
vehemently
0.82
disagree
0.81
uously
0.78
edIn
0.78
lihood
0.76
llah
0.71
atively
0.71
uous
0.71
Activations Density 0.026%