INDEX
Explanations
references to political situations or governmental statements
New Auto-Interp
Negative Logits
synagogue
-0.80
chini
-0.79
ivably
-0.74
slot
-0.69
aban
-0.68
rament
-0.68
raf
-0.67
FANTASY
-0.66
rather
-0.66
nikov
-0.65
POSITIVE LOGITS
Interested
0.69
Thomas
0.67
Machine
0.64
Ass
0.62
SPONSORED
0.62
Cong
0.60
ENCY
0.60
arching
0.60
Speaking
0.58
Add
0.58
Activations Density 0.231%