INDEX
Explanations
differing opinions or contrasting viewpoints among a group of people
instances of the word "others" indicating comparisons or opinions in groups
New Auto-Interp
Negative Logits
-+
-0.71
"},"
-0.66
ocracy
-0.66
Yo
-0.61
Guru
-0.59
Understanding
-0.58
Registration
-0.58
Alright
-0.57
Assignment
-0.57
Bound
-0.56
POSITIVE LOGITS
prefer
0.98
are
0.96
simply
0.90
succumb
0.89
merely
0.88
aren
0.88
succumbed
0.85
were
0.84
remain
0.82
seem
0.81
Activations Density 0.089%