INDEX
Explanations
mentions of Americans and their opinions or actions
New Auto-Interp
Negative Logits
operation
-0.64
nec
-0.64
Dj
-0.62
ces
-0.62
NB
-0.62
cer
-0.62
Samson
-0.61
Initialized
-0.61
Prev
-0.61
clusive
-0.60
POSITIVE LOGITS
ourcing
0.88
icans
0.83
hip
0.80
chool
0.80
ourced
0.79
ervatives
0.76
ophobia
0.74
Skies
0.74
ervative
0.73
aurus
0.73
Activations Density 0.025%