INDEX
Explanations
phrases related to joining or supporting a cause or movement
phrases indicating the action of "jumping on" or participating in trends or movements
New Auto-Interp
Negative Logits
Policy
-0.72
-0.69
heastern
-0.66
isf
-0.66
minus
-0.62
lf
-0.62
larg
-0.62
ertodd
-0.61
respect
-0.60
ples
-0.59
POSITIVE LOGITS
bandwagon
1.01
hoops
0.90
ulic
0.76
leaps
0.74
sidx
0.72
obin
0.71
wagon
0.71
conclusions
0.69
leon
0.67
straw
0.64
Activations Density 0.132%