INDEX
Explanations
terms related to political and health-related topics
terms related to political issues and social justice
New Auto-Interp
Negative Logits
nings
-0.61
Prison
-0.58
lich
-0.58
Rh
-0.58
catch
-0.58
Kore
-0.56
Huckabee
-0.55
Mobil
-0.54
Sandra
-0.54
Sailor
-0.54
POSITIVE LOGITS
ngth
0.87
etheless
0.82
=""
0.82
ancial
0.79
alike
0.79
srf
0.77
bably
0.74
ogether
0.73
pairs
0.71
ï¸ı
0.69
Activations Density 0.192%