INDEX
Explanations
mentions of statements or positions made by political figures
New Auto-Interp
Negative Logits
heter
-0.99
assis
-0.99
ILCS
-0.94
conditioning
-0.89
phrine
-0.86
Adin
-0.84
76561
-0.83
Fram
-0.81
WARE
-0.81
ASE
-0.77
POSITIVE LOGITS
sure
1.33
hift
1.15
headlines
1.11
ailable
1.11
strides
1.08
ÄŁ
1.08
netflix
1.06
undo
1.04
awed
1.03
itions
1.03
Activations Density 1.595%