INDEX
Explanations
sentences discussing human qualities or experiences
references to the concept of being human
New Auto-Interp
Negative Logits
OHN
-0.71
arella
-0.69
Transcript
-0.69
liga
-0.67
forth
-0.66
INO
-0.66
rav
-0.65
urations
-0.65
armac
-0.65
effective
-0.65
POSITIVE LOGITS
beings
1.38
itar
1.20
itarian
1.10
istic
1.08
izing
0.96
oids
0.94
ized
0.92
istically
0.91
itary
0.90
readable
0.89
Activations Density 0.031%