INDEX
Explanations
terms related to political and social movements or actions
terms related to political and social advocacy or movements
New Auto-Interp
Negative Logits
enegger
-0.75
undermin
-0.60
renheit
-0.57
milo
-0.57
abwe
-0.57
Lago
-0.56
arettes
-0.56
jri
-0.53
itored
-0.53
ecause
-0.52
POSITIVE LOGITS
iest
0.76
portion
0.67
liest
0.66
osphere
0.57
matchup
0.55
hest
0.54
axis
0.52
element
0.51
cients
0.51
section
0.50
Activations Density 0.914%