INDEX
Explanations
mentions of political ideologies or extreme beliefs
New Auto-Interp
Negative Logits
osuke
-0.84
ilage
-0.81
icative
-0.81
tons
-0.80
icating
-0.77
oted
-0.76
rooms
-0.76
lins
-0.74
shaw
-0.73
icate
-0.73
POSITIVE LOGITS
temperatures
0.79
sensitivity
0.79
vetting
0.76
poverty
0.75
rare
0.75
ctic
0.72
reality
0.71
Measures
0.69
raviolet
0.69
lifestyles
0.68
Activations Density 1.277%