INDEX
Explanations
references to the human body
references to the concept of "human" and its characteristics or existence
New Auto-Interp
Negative Logits
urations
-0.75
liga
-0.70
arella
-0.68
creen
-0.67
eryl
-0.66
OHN
-0.66
RAG
-0.66
uden
-0.65
è¦
-0.64
chell
-0.64
POSITIVE LOGITS
beings
1.44
itarian
1.17
itar
1.14
oids
1.11
istic
1.06
izing
0.95
readable
0.94
embryonic
0.94
ized
0.94
oid
0.93
Activations Density 0.030%