INDEX
Explanations
terms related to strong negative emotions or ideologies, such as fascism, bigotry, and hatred
themes related to fascism, fasting, and bigotry
New Auto-Interp
Negative Logits
GN
-0.69
soDeliveryDate
-0.66
saturation
-0.66
mosqu
-0.66
nep
-0.62
indo
-0.61
urer
-0.61
antioxid
-0.61
aline
-0.61
xus
-0.61
POSITIVE LOGITS
istics
0.91
istical
0.84
dar
0.83
atform
0.83
lication
0.75
Toro
0.73
trump
0.72
spread
0.72
idad
0.69
keley
0.68
Activations Density 0.030%