INDEX
Explanations
phrases containing the word "human"
references to humans and their characteristics or behaviors
New Auto-Interp
Negative Logits
arella
-0.78
angles
-0.75
RAG
-0.74
kick
-0.74
forth
-0.71
rypt
-0.70
wark
-0.70
ippi
-0.69
ãģĨ
-0.69
etsy
-0.68
POSITIVE LOGITS
beings
1.18
readable
0.99
embryonic
0.99
oids
0.97
itarian
0.85
rights
0.84
itar
0.81
genome
0.81
civilization
0.80
fra
0.78
Activations Density 0.023%