INDEX
Explanations
phrases that indicate disagreement or contrast between different groups of people
references to differing opinions or perspectives
New Auto-Interp
Negative Logits
-+
-0.73
Guru
-0.68
ocracy
-0.65
Donation
-0.64
Bound
-0.61
\/
-0.59
)|
-0.59
before
-0.59
Registration
-0.58
Yo
-0.58
POSITIVE LOGITS
succumb
0.86
prefer
0.80
succumbed
0.78
simply
0.76
opted
0.72
remain
0.70
enes
0.69
merely
0.67
volunteered
0.67
are
0.67
Activations Density 0.107%