INDEX
Explanations
mentions of the word "humans" in the text
mentions of humans
New Auto-Interp
Negative Logits
Crom
-0.71
ounter
-0.66
sbm
-0.64
pton
-0.63
tie
-0.63
exclusive
-0.62
orama
-0.61
Style
-0.61
magazine
-0.61
pin
-0.61
POSITIVE LOGITS
humans
3.59
Humans
2.81
humans
2.52
human
2.11
humankind
2.10
mortals
2.00
mammals
1.91
primates
1.84
human
1.83
humanity
1.82
Activations Density 0.020%