INDEX
Explanations
phrases indicating equality, inclusion, and shared experiences
expressions of collective experiences and opinions
New Auto-Interp
Negative Logits
qus
-0.76
OGR
-0.72
interstitial
-0.63
lure
-0.60
ainment
-0.59
Newsletter
-0.59
Advertisement
-0.57
stra
-0.56
Examiner
-0.55
eka
-0.55
POSITIVE LOGITS
except
1.16
alike
0.86
except
0.85
equally
0.83
Tes
0.73
ses
0.70
imaginable
0.70
agree
0.67
individually
0.65
lishes
0.65
Activations Density 0.231%