INDEX
Explanations
positions of power and authority
references to authority and social or political structures
New Auto-Interp
Negative Logits
ppings
-0.74
Flavoring
-0.71
VERTISEMENT
-0.70
Alternatively
-0.63
azines
-0.60
})
-0.60
imeo
-0.60
luster
-0.58
Cosponsors
-0.55
SPONSORED
-0.55
POSITIVE LOGITS
!!!!
1.05
!!!!!!!!
1.02
!!!
1.00
somewhere
0.94
!"
0.93
RIGHT
0.92
whose
0.92
EVERY
0.90
tonight
0.90
!!!!!
0.88
Activations Density 0.625%