INDEX
Explanations
supportive language and words related to advocating for a specific cause or belief
references to a specific cause or movement
New Auto-Interp
Negative Logits
aeper
-0.78
ault
-0.75
Leopard
-0.73
Ku
-0.69
Pione
-0.69
olitan
-0.68
Sheep
-0.68
lav
-0.66
Centers
-0.66
Technique
-0.65
POSITIVE LOGITS
cele
1.27
cause
0.81
Cause
0.79
way
0.76
celeb
0.74
facts
0.73
wagon
0.71
forge
0.71
DNA
0.71
fare
0.70
Activations Density 0.028%