INDEX
Explanations
mentions of specific political affiliations
New Auto-Interp
Negative Logits
eternity
-0.65
ategory
-0.64
=~
-0.64
ILA
-0.63
Deadly
-0.62
Okawaru
-0.61
Teacher
-0.61
Rocks
-0.60
Lets
-0.59
Chimera
-0.59
POSITIVE LOGITS
specialize
1.05
prefer
1.04
have
1.02
opted
1.02
succumbed
1.02
swear
1.00
agree
0.99
believe
0.99
aspire
0.97
participated
0.95
Activations Density 0.290%