INDEX
Explanations
concepts related to human rights and activism
New Auto-Interp
Negative Logits
uninsured
-0.17
gilt
-0.15
cuckold
-0.15
patriotic
-0.14
Gang
-0.14
crop
-0.14
intrig
-0.14
EMS
-0.14
giveaways
-0.14
feit
-0.13
POSITIVE LOGITS
human
0.42
Human
0.40
Human
0.36
human
0.34
rights
0.30
-human
0.29
_human
0.28
HR
0.28
Rights
0.28
Amnesty
0.27
Activations Density 0.127%