INDEX
Explanations
phrases indicating disagreement
instances of the word "disagree."
New Auto-Interp
Negative Logits
Roads
-0.74
Jackets
-0.73
recorded
-0.71
pmwiki
-0.69
ammy
-0.69
GV
-0.67
eval
-0.66
examination
-0.66
ams
-0.65
Pros
-0.64
POSITIVE LOGITS
disagree
1.25
disagrees
0.91
rences
0.90
disagreement
0.87
edIn
0.84
disagreed
0.82
disagreements
0.79
opinions
0.78
agre
0.75
uous
0.75
Activations Density 0.010%