INDEX
Explanations
references to human-related aspects, especially focusing on human rights and the human experience
references to human rights and the concept of humanity
New Auto-Interp
Negative Logits
Mods
-0.76
tie
-0.70
Recipes
-0.69
Goes
-0.68
iques
-0.65
ops
-0.64
tesy
-0.64
exclusive
-0.64
ocket
-0.63
went
-0.62
POSITIVE LOGITS
human
3.54
Human
2.56
Human
2.53
human
2.51
humans
2.44
humans
1.91
humanity
1.85
Humans
1.84
mammalian
1.80
humankind
1.60
Activations Density 0.021%